for utxo, my approach with iguana creates a parallel set of datasets in read only memory mapped files. Arguably, this is the ultimate DB as it simply cant get any faster than this.
All things needed for block explorer level queries are precalculated and put into the read only files.
By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.
I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)
I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.
The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.
[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).
the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.
Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.
At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.
James
All things needed for block explorer level queries are precalculated and put into the read only files.
By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.
I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)
I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.
The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.
[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).
the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.
Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.
At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.
James