Blockchain size reduction for new nodes

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
@adamstgbit: Yes, probably a good idea. However, theoretically you'd be able to read all of BU's source and discover shenanigans whereas you will theoretically not necessarily be easily able to when it is the wrong UTXO hash, wherever that comes from.
 
  • Like
Reactions: Mengerian

honeybadger

New Member
Dec 14, 2016
4
3
This only works if the UTXO tree hash is enforced by the consensus. It would have to become a special field in the block header. Otherwise it's useless.

@Peter R
True, but computing the hash of ~2GB must be a hefty overhead, so nicer to only do it rarely, like every 4096 blocks.
Updating UTXO set should take O(countOfNewTransactions*lg(sizeOfUtxoSet)) each block, which is trivial.

--
Protocol logic:
1. Nodes and miners keep utxo set for a 'maximum reasonable forking time'. That's what it boils down to. If you accept one-month long forks you need to keep one month of full blocks.
2. Every block has a hash of its current utxo set.
3. Nodes keep two utxo sets: one for 'maximum reasonable forking time' and the current one. They use the current one to verify transactions. Old one is sent to new nodes along with full blocks since that block height.

When small orphaning happens, nodes have to recompute utxo set.

Performance optimizations:
Keep additional utxo set at last month's maximum orphan length blocks + 1 in the past, let's say 4. When a new block gets orphaned, recompute utxo set from that, at a negligible cost.
In case of deeper forks utxo set would have to be recomputed from the oldest one, however this is likely to happen very rarely, if ever.

In any case, even on a weak pc generating utxo set from a month old one + full blocks should take <1 minute.
 
  • Like
Reactions: awemany

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
@honeybadger : Yes, exactly, making that eventually mandatory consensus is what I am thinking about. That can even be done with a soft fork(TM). In any case, I think providing trusted, manual chain snapshots might also be a good idea and good enough for some.
 

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
@adamstgbit: Yes, probably a good idea. However, theoretically you'd be able to read all of BU's source and discover shenanigans whereas you will theoretically not necessarily be easily able to when it is the wrong UTXO hash, wherever that comes from.
I guess validating the UTOX hash might require a little work, but It only needs to be validated once.
the build itself has a hash.
if a few independent parties confrim the UTXO hash is valid, then as long as the build's hash is valid...


@honeybadger
This only works if the UTXO tree hash is enforced by the consensus. It would have to become a special field in the block header. Otherwise it's useless
its simply some code you trust to run because its been peer reviewed.
[doublepost=1481811533,1481810766][/doublepost]
@honeybadger : Yes, exactly, making that eventually mandatory consensus is what I am thinking about. That can even be done with a soft fork(TM). In any case, I think providing trusted, manual chain snapshots might also be a good idea and good enough for some.
100% agree
it would be best if nodes provided the absolute latest version of the "bootstrapping-UTXO" on demand and this could always be easily verified by checking its corresponding hash in new blocks.
that would be Neat as hell!
new nodes could always start up with only 1 week of full blocks to download worst case.
and old nodes wouldnt need to provide GB's worth of uploads to new nodes.

having hardcoded UTXO is an good way to phase in this functionally.
 

honeybadger

New Member
Dec 14, 2016
4
3
Regarding performance, it turns out updates can be made linear in number of updates, completely independent of utxo set size! With the help of an additively homomorphic hash. So hashing performance is a total non-issue.

In any case, I think providing trusted, manual chain snapshots might also be a good idea and good enough for some.
It's already possible to do eg. by torrenting and seeding a pruned database.

@adamstgbit
Having old utxo set in a client would create tons of confusion and a security danger for minuscule gains.
Changing the protocol to verify utxo hashes means everybody can delete old history.
With utxo embedded in a client that's not possible, because everyone would effectively trust developers. If you trust some entity, mining stops making sense.
The only gain is less GBs to download, as storage-wise you can already prune.
 
Last edited:

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
@honeybadger
i guess i mean having a pruned database, hardcoded into the client.

trusting devs is not nessary, trust the peer-review of the code they produce, the pruned database is just another piece of code thats peer-reviewed.

how would you go about trusting a "torrent pruned database", poeple would look at it, and then declare that if the torrent pruned database file you downloaded hashes to _BLA_ its good!

same thing with the BU binaries.

not having to download & verify GBs, is a massive gain IMO.
 

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
It's already possible to do eg. by torrenting and seeding a pruned database.
True, I just think streamlining that process and hashing just the relevant data (and not all the DB structures, which can change and are implementation dependent) would still be a benefit.

@adamstgbit: The problem is that while you can peer review code, you cannot peer review a hash! If there is 6335ffa...34ffe1 somewhere in the code, you cannot review that, you have to trust it.
If the whole network of miners says '6335ffa...34ffe1 is valid' and it would be easy for any archival full node (or really just any node around at the time that a miner tries to get a broken UTXO set accepted by everyone else) to give a fraud proof, i.e. say 'hash xyz is broken because it doesn't match the old UTXO set in his old header plus this changeset', then you'd gain a lot.
 

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
@awemany
I would not include a hash of the hardcoded pruned database, nor would i make nodes try to validate other nodes pruned database code.

poeple can review the hash...
all you have to do is hash the bloody exe file and see it matches what BU.com is saying the hash ought to be for that perticular build.
now you KNOW your running BU Version 0.12.x without any alterations.
at after hashing the exe and seeing it comes out to 85jd84....n42cj8 ( as you expect ), there no question you're running the same code thats been peer reviewed.
that is good enough. isnt it?

I cannot imagine a scenario in which this pruned database code would be anything other than the right thing. can you?

@honeybadger

can you?
 
Last edited:

TierNolan

New Member
Nov 19, 2018
12
7
Imagine that the UTXO set was hashed into a Merkle tree every block and that the root hash was included in the block header (or coinbase TX). Delivering an SPV client the Merkle path from an unspent output to the root hash is proof that the output has not yet been spent in any confirmed transaction.
This suggestion was made ages ago on bitcointalk forum, but never really worked on. The link is to a fairly long thread that meanders a little.

What is really cool is that a block can include the proof that the merkle tree was correctly updated too. I discussed this property here.

For each transaction input, you include the proof that it is valid. This proof also allows you delete that input from the set.

Similarly, for each transaction output, you include the proof that it can be inserted. That proof allows you to delete the output from the set.

Basically, if I give you the merkle root of the UTXO set and proof that an input exists, you can verify that proof and then also compute the new merkle root. The new merkle root is for the exact same set, but with that output matching that input deleted.

Likewise, if I give you the current merkle root and the path to where it should be inserted, you can add an output into the set. You can prove that it is the valid location to add the output and you can work out the merkle root for the set that contains adding the output and also the path counts as proof that the addition is valid.

So, an expanded transaction would be

Code:
- tx header
- inputs
-- proof of existance for input 0
-- input 0
-- proof of existance for input 1
-- input 1
...
- output
-- insertion merkle path for output 0
-- output 0
-- insertion merkle path for output 0
-- output 0
...
A full node could convert a normal transaction into an expanded transaction by adding the paths.

It has a lot of bloat though :). A merkle path is needed for each proof. The number of utxos at the moment is around 64 million. That is 2^26 or 26 levels of a binary tree. The merkle branch would be 26 hashes long. At 32 bytes per hash, that is 832 bytes.

With a 2 input, 2 output transaction would increase from 250 bytes to 3578 bytes. This is a big increase in block size (14X) but means that each block could be verified independently.

You take a block and it says what the merkle root of the UTXO set is. You can then go through each transaction and update the merkle root for each input and output. At the end, you get the updated merkle root. That root is included in the block too.

This means that someone who has never seen any other block can verify that that block is valid and connects to its parent.

An invalid blockchain would have at least 1 block which was invalid.