a node depends on other nodes for data
so if you ask for just one block, then you are lucky to be actively using it for 25% of the time
so many implementations ask for 100 to 500 blocks or even 2000 blocks all at once
then they might come in, all of them, or maybe some, or none
but if any doesnt come in, then all processing is stuck
so this is like a queues at the airport
imagine if there was just one queue
that is how all the other implementations do things
iguana has an arbitrary number of queues settable via the addcoin JSON
the data is first put into a "raw" file per block, it is actually a nano-ramchain of a special format
this is done without any interlocks by all the peer processes directly, so not even a context switch
there are helper threads "numhelpers" in the json, and these helpers make the multiple bundle queues
each bundle starts in the initial state to where it is 100% done, and goes through several steps
first step is to get the header for all the blocks
then the next step is to get all the data (via the parallel peer process above)
next, the bundle worth of files are combined into a single ramchain file, which resolves any resolvable references
it is like linking a program, but it could still have unresolved references, ie vin that refers to a txid not inside this bundle
now the bundle waits for all prior bundles to get at least the second pass data
at this point it is ready to calculate the fully dereferenced spend vectors, as all txids that are referenced must be findable in some prior bundle. this is the third pass data
now, it is not needed to wait for all bundles to have the third pass data to update balances, but I do that since it is relatively small processing and creates a deterministic path for the linked lists of each addresses spending
this means it is a serial process, only one queue for this special task
in parallel to all this is the construction of the mainchain and it is also going as a single special task
The above creates a set of readonly files for 95%+ of the blockchain, it also creates a set of vin datafiles that can be verified in the background and purged. The only data that changes is the unspent status and address balance/lastspend location.
The final issue is how to efficiently update the above volatile data structures atomically and how to deal with partial sized bundle for the realtime blocks. One approach for the latter is to just regenerate the full bundle for each block, which is ok for small blockchains, but with a minute or two to recalculate, at least the latest block would need to be able to specifically searched.
The tricky part is resuming...
Let us assume that we have a validated set of volatile data as of the last bundle boundary. This will be a rather large size due to having entries for all the unspents and addresses, even though 8 to 12 bytes each, it still adds up.
In order to calculate the current balance, all the current bundle's spends would need to adjust the large bundle boundary dataset. The spend vector is relatively fast to create, so this can be updated for each block, especially as it is append-only.
On restart the bundle specific balance data is regenerated to match the blocks in the partial bundle, so adding the balance from the boundary dataset and the realtime dataset will get the data as of the end of the partial bundle. Then as new blocks come in, they are put into a special search set, until they get added to the realtime bundle.
so if you ask for just one block, then you are lucky to be actively using it for 25% of the time
so many implementations ask for 100 to 500 blocks or even 2000 blocks all at once
then they might come in, all of them, or maybe some, or none
but if any doesnt come in, then all processing is stuck
so this is like a queues at the airport
imagine if there was just one queue
that is how all the other implementations do things
iguana has an arbitrary number of queues settable via the addcoin JSON
the data is first put into a "raw" file per block, it is actually a nano-ramchain of a special format
this is done without any interlocks by all the peer processes directly, so not even a context switch
there are helper threads "numhelpers" in the json, and these helpers make the multiple bundle queues
each bundle starts in the initial state to where it is 100% done, and goes through several steps
first step is to get the header for all the blocks
then the next step is to get all the data (via the parallel peer process above)
next, the bundle worth of files are combined into a single ramchain file, which resolves any resolvable references
it is like linking a program, but it could still have unresolved references, ie vin that refers to a txid not inside this bundle
now the bundle waits for all prior bundles to get at least the second pass data
at this point it is ready to calculate the fully dereferenced spend vectors, as all txids that are referenced must be findable in some prior bundle. this is the third pass data
now, it is not needed to wait for all bundles to have the third pass data to update balances, but I do that since it is relatively small processing and creates a deterministic path for the linked lists of each addresses spending
this means it is a serial process, only one queue for this special task
in parallel to all this is the construction of the mainchain and it is also going as a single special task
The above creates a set of readonly files for 95%+ of the blockchain, it also creates a set of vin datafiles that can be verified in the background and purged. The only data that changes is the unspent status and address balance/lastspend location.
The final issue is how to efficiently update the above volatile data structures atomically and how to deal with partial sized bundle for the realtime blocks. One approach for the latter is to just regenerate the full bundle for each block, which is ok for small blockchains, but with a minute or two to recalculate, at least the latest block would need to be able to specifically searched.
The tricky part is resuming...
Let us assume that we have a validated set of volatile data as of the last bundle boundary. This will be a rather large size due to having entries for all the unspents and addresses, even though 8 to 12 bytes each, it still adds up.
In order to calculate the current balance, all the current bundle's spends would need to adjust the large bundle boundary dataset. The spend vector is relatively fast to create, so this can be updated for each block, especially as it is append-only.
On restart the bundle specific balance data is regenerated to match the blocks in the partial bundle, so adding the balance from the boundary dataset and the realtime dataset will get the data as of the end of the partial bundle. Then as new blocks come in, they are put into a special search set, until they get added to the realtime bundle.