Question about updates to blk* files

TierNolan

New Member
Nov 19, 2018
12
7
My understanding of the blk* files is that they operate on a write once basis (at least once the system moves to the next blk file).

Is that correct? I wiped the blocks directory and then did a full download + resync. most of the files have updated times before the end of the resync.

blk0000.dat has a date after the resync happened and there are a few others.

This is the last few lines of "ls -lrt" for the /blocks directory. The download finished around 13:14.

Code:
-rw------- 1 tiern tiern  19922944 Nov 19 06:48 rev01041.dat
-rw------- 1 tiern tiern  18874368 Nov 19 06:48 rev01040.dat
-rw------- 1 tiern tiern  83886080 Nov 19 13:14 blk01042.dat
-rw------- 1 tiern tiern  11534336 Nov 19 13:14 rev01042.dat
-rw------- 1 tiern tiern 134215061 Nov 19 15:57 blk00001.dat
-rw------- 1 tiern tiern 134215184 Nov 19 15:57 blk00004.dat
-rw------- 1 tiern tiern 134216866 Nov 19 15:57 blk00006.dat
-rw------- 1 tiern tiern 134213458 Nov 19 15:57 blk00007.dat
-rw------- 1 tiern tiern 134198809 Nov 19 15:57 blk00008.dat
-rw------- 1 tiern tiern 134207767 Nov 19 15:57 blk00009.dat
-rw------- 1 tiern tiern  18230116 Nov 19 15:57 rev00006.dat
-rw------- 1 tiern tiern  18383799 Nov 19 15:57 rev00008.dat
-rw------- 1 tiern tiern  17854605 Nov 19 15:57 rev00007.dat
-rw------- 1 tiern tiern  16835027 Nov 19 15:57 rev00004.dat
-rw------- 1 tiern tiern  16968423 Nov 19 15:57 rev00001.dat
-rw------- 1 tiern tiern  18364187 Nov 19 15:57 rev00009.dat
-rw------- 1 tiern tiern 134192280 Nov 19 16:15 blk00010.dat
-rw------- 1 tiern tiern  17973811 Nov 19 16:15 rev00010.dat
-rw------- 1 tiern tiern  18432847 Nov 19 18:18 rev00012.dat
-rw------- 1 tiern tiern 134208491 Nov 19 18:18 blk00012.dat
-rw------- 1 tiern tiern 134171857 Nov 19 19:30 blk00013.dat
-rw------- 1 tiern tiern  18874368 Nov 19 19:30 rev00013.dat
drwx------ 2 tiern tiern  4096 Nov 19 19:37 index
-rw------- 1 tiern tiern 134215291 Nov 19 19:47 blk00000.dat
-rw------- 1 tiern tiern  19506364 Nov 19 19:47 rev00000.dat
-rw------- 1 tiern tiern 134059558 Nov 19 20:00 blk00096.dat
-rw------- 1 tiern tiern  18753780 Nov 19 20:00 rev00096.dat
-rw------- 1 tiern tiern 134067333 Nov 19 20:04 blk00113.dat
-rw------- 1 tiern tiern  18568307 Nov 19 20:04 rev00113.dat
-rw------- 1 tiern tiern 133896127 Nov 19 20:18 blk00115.dat
-rw------- 1 tiern tiern  18697397 Nov 19 20:18 rev00115.dat
-rw------- 1 tiern tiern  18874368 Nov 19 20:19 rev00985.dat
-rw------- 1 tiern tiern 129267949 Nov 19 20:19 blk00985.dat
It looks like the changes are making the files larger. Does the database remember the size of each blk file and then add new blocks to any earlier blk file that has space?
 
  • Like
Reactions: solex

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,695
Nice to see you here @TierNolan. I remember your posts well on bitcointalk when Bitcoin was much more united than today.
I can't answer your question, but hopefully @theZerg or @Peter Tschipper can...
Can you advise which version of BU you are using?
 

TierNolan

New Member
Nov 19, 2018
12
7
Thanks for the welcome. I was looking at the Armory code to try to keep it working with Bitcoin Cash.

It only requires changing the magic pattern for P2P at the moment. This is a change to the python code, so is reasonable low difficulty.

Bitcoin Cash clients seem to keep the old magic pattern for the disk files, so that part can be left unchanged.

Armory's security assumption is to trust the node it is connected to. This means that it doesn't do any checks that would break things. The only thing it uses the P2P message for is to be notified that a new block arrived. It reads the block from the blk*.dat files.

I made some other comments at the Armory forum on bitcointalk after looking at the Bitcoin Unlimited code.

I think this also applies to the other codebases too, since the relevant code was written in 2012. I haven't actually seen the effect for any other codebases though.

It seems that when saving a new block file to disk, the node checks all existing blk files to see if they have space. It slots the block into the first one (lowest index)that has space.

It normally doesn't go backwards due to a variable in RAM. If it writes the blk00100.dat, it will never write to any with an index below 100. Since the variable is in RAM, a node restart will cause it to start checking from size 0 again.

I think that Bitcoin Cash's more variable block size may have exposed this behavior. If the node gets a 2MB block but only has 1.5MB of space, the node will move on to the next index.

On the next restart, if the first block received is 50kB, then it will be placed in that blk file, rather than appended.

With a near constant stream of around 1MB blocks, that is less likely to happen with the Bitcoin chain. I think it could still happen though.

A 0.99MB block with 0.98MB of space could leave a space and then the first block after a restart would need to be < 0.98MB.

Armory assumes that the blk files are an append-only filesystem. This is true for each blk file, but doesn't appear true for the system as a whole.

I think they should to change their scan to look at last modified times for the blk files. It is the only way to be compatible. The alternative is to tell people to wipe the armory database directory if their client gets stuck.

[Edit]
I did some further checking and I think this effect is due to the changes to blockstorage.cpp. The call to ReadLastBlockFile doesn't change the nLastBlockFile global variable. It just targets a local variable in the function.

This means the variable stays at its default of zero.

This is what allows the node to write to previously completed blk*.dat files rather than only looking at new blk files.

In the Bitcoin Core client, the call to ReadLastBlockFile has a side effect that initializes the nLastBlockFile variable.

This is a Bitcoin Unlimited only behavior, I think, since Bitcoin ABC hasn't made the changes.
 
Last edited:
  • Like
Reactions: torusJKL

sickpig

Active Member
Aug 28, 2015
926
2,541
@TierNolan thanks for the notification. I think that @Griffith is the last one that touched that part of the code base. Will ask him to have a look at the issue you opened on github. Thanks again for reviewing our code.
 
  • Like
Reactions: solex

Griffith

Active Member
Jun 5, 2017
188
157
@TierNolan i looked into your issue and it seems you are just looking at the wrong part of the code. The code you found was for syncing between different storage methods (changing from leveldb to blk files and vise versa) since we support more than one.

loadedblockfile is used inside SyncStorage() which is called here :https://github.com/BitcoinUnlimited/BitcoinUnlimited/blob/dev/src/validation/validation.cpp#L343 and is thrown away at the end of the method. the global variable nLastBlockFile is initialized about 100 lines later here: https://github.com/BitcoinUnlimited/BitcoinUnlimited/blob/dev/src/validation/validation.cpp#L447

(posted same response on the github issue thread)
 
  • Like
Reactions: solex

TierNolan

New Member
Nov 19, 2018
12
7
I updated the github thread. You are right. Looking again, I think the problem is that the variable isn't written to the database at all.