How to safeguard datadir when switching clients on a node

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I have experimented a little with how to make a backup (or perhaps better - a snapshot) of my ~/.bitcoin/ datadir so that when I test a new client which potentially adds incompatible blocks on top of the current chain, I can later switch back quickly to the old client version and the old state of the datadir.

This is useful for testing during protocol development, where I don't want to have to backup and restore the entire 70G blockchain datadir everytime, much less re-index it.

So far, I have a proof-of-concept backup script that works quite well.

In this post I'll explain first how it works, and would be happy for your input on the concept.
I'll polish the backup script up a bit (perhaps adding your suggestions) and add a restore script too, as restore is so far a manual process which is a little too prone to mistakes.
Then I'll release the scripts on github for others to review and modify as they see fit.

---

How it works:

Backing up

Essentially, the script backs up only the most recent block data in blocks/ which is still changing (i.e. the two most recent blk*.dat and rev*.dat) files. The rest of the blk*.dat and rev*.dat files don't seem to change, so they are left as-is under the assumption (!) that my tests will also not disturb them.

NOTE: If your tests would modify older block database files, then this backup tool would not be sufficient for your use case! Same goes for pruning operations or other database compaction (does that happen?) which may be done after a backup. For those cases, always keep a full backup of the datadir somewhere handy!

Apart from the most recent blocks data, the script also backs up:
- the entire blocks/index/ contents
- the entire chainstate/ folder
- files located directly in the datadir itself, with some exceptions: no debug.log files, not bitcoin.conf or bitcoin.pid files, no LOCK files. That means it includes peers.dat, wallet.dat, banlist.dat etc. It currently excludes bitcoin.conf files because it is assumed that you might want to keep them around as-is, and not restore over them by accident.

The end result is a timestamped backup tarball. It is not compressed because the data does not compress well.
But it is beautifully small - on my system a current snapshot of this kind takes around 1.5G.

Restoring from backup

This is currently a manual process which I still need to script.

What I do is the following:

- shut down the running experimental client
- remove recursively the blocks/index/ and chainstate/ folders (as they will be completely restored from backup)
- remove any newer blk*.dat and rev*.dat files that have been created since the backup was made (they will have a higher number in their filename). Be very careful not to accidentally remove any older block files that are not contained in the snapshot, otherwise you'll need to restore them from some other backup.
- remove any other new files beneath the datadir that the experimental client created
(e.g. hashcache.dat and forked_peers.dat created post-fork by Satoshi's Bitcoin client)
- go to the parent directory of your datadir, and untar the backup tarball.
This will re-create the index, chainstate and overwrite the last-modified block db files, putting you back exactly where you were in terms of the blockchain.
- check that bitcoin.conf and debug.log are correct and clean
- run up stable client again, it should sync up fine

I have tested this procedure once, recovering from the public test run of SatoshisBitcoin to my default client installation, on a VPS which would not have had enough space for a full backup.

Limitations

Since the backups are incremental, you might need to restore multiple ones to recover to some intermediate stage. It may be possible to script this to combine (merge) the incremental backup files in order to create a single backup for a later stage, but I don't plan on writing a script for that right now.

Disclaimer

You alone are responsible for your blockchain and wallet data, even if you follow my description or run the scripts. Always have proper full backups, and test on test systems. /obviously
 
Last edited: