Blockchain size reduction for new nodes

Peter Gregory Jr. · Jul 10, 2016

I've written a doc recently with description how pure UTXO set could be used instead of full blockchain for initialization of the nodes. This will reduce download and validation time dramatically (now is usually several days).

https://www.scribd.com/document/317130737/Bitcoin-On-Chain-Pruning

TLDR:
Nodes make a snapshot of all history and include the hash of it after waiting 4096 blocks.
This is a discrete process that happens once in 4096 blocks (~1 month) and doesn't require any ongoing load on nodes.
With inclusion of hashes in coinbase transactions of main chain - it will be covered by the full power of PoW.

Interested to here opinions...

Peter R · Jul 10, 2016

Like I've said before, I think this is a good idea and something that we should try to implement in Bitcoin Unlimited.

Perhaps one of the first tasks would be to unambiguously define the file format for the "UTXO block."

We need to decide on exactly what information needs to be retained and in which order the UTXO set should be sorted. (I guess for each output we need at least the transaction hash, the output's index, the value, and the scriptPubKey.)

We also need to define how the outputs are hashed. Which hash function is used and whether we hash into a Merkle tree or not (hashing into a Merkle tree would allow these UTXO blocks to more readily be used as the snapshot file for spinoffs).

Once we have this, then we could probably add code to BU to generate the UTXO blocks and the associated hash.

We could then publish the UTXO blocks on our website and write the block's hash into OP_RETURN outputs sent from some "trusted" BU address. Of course, we would also try to get miner support so that we could migrate to a trustless scheme in the future.

solex · Jul 10, 2016

@Peter Gregory Jr.
Nice write-up.
Yes, waiting for 95% of hash-power requires full backing by Core. Since the hash of the previous UTXO block is constant for a month then it could be included in the coinbase txn of any random block, not just a multiple of 4096. With a few % of the miners writing it then the UTXO set could still be confirmed when a new node starts up and interrogates any BU peers for the hash value as a cross-check.

Peter R · Jul 10, 2016

@solex

Yeah, I think the protocol needs to be very flexible in order to be useful before the majority of the hash power is onboard (which may never happen).

At this stage, I don't see why a period for UTXO blocks needs to be specified at all. If the UTXO block's file header specifies the block that the file is current to (as described in @Peter Gregory Jr.'s document) then that's all that really matters, right?

Peter Gregory Jr. · Jul 10, 2016

Thanks for your feedback.

To address your questions:

When the hash of UTXO is inserted into main blockchain - any random block doesn't really work. As well as it doesn't work in the way that a header of UTXO block specifies the block of the main chain.

Any malicious miner can create fake UTXO block and link it to a block where he includes it, given that he has enought power to mine a block.

The idea of having a particular height (multiple of 4096) is done on purpose to limit the possibility to inject connection to UTXO, and it works only when majority of the network accepted this algorithm.

In case the reality is that we need to have it working before majority of mining power accepts it - I don't see how it could be added by miners to raw data of blocks (in the way it is described in the doc via coinbase txn).

The point of having the hash in blocks 4096 had several reasons:
- predictable place to look for UTXO hash
- leaving another 4096 blocks on top of it, which presumes security

But as I said if majority of hash power doesn't accept it, there is no sense to include it in coinbase transaction, as anybody could do this in a malicious way.

Probably the working solution for BU could be, that there is a trusted BTC address, which has transactions with OP_RETURN containing hash. And then somebody issuing it manually after generating file. But this could be done only as a proof-of-concept way, because that basically forces to trust BU developers, which I think community won't like. Also how to get info about last txns from this address, via SPV way, then it could be falsified by Sybil attack?

solex · Jul 10, 2016

@Peter R
True, but computing the hash of ~2GB must be a hefty overhead, so nicer to only do it rarely, like every 4096 blocks.
Edit: smaller size than first posted.
@Peter Gregory Jr.
A malicious miner creating a hash to a bad UTXO set (maybe with a fake balance he can spend from), presumably aims to be more than a nuisance, and wants to try to profit. All the nodes that are up-to-date will know the correct hash at the previous 4096 multiple so a node in resync should compare the hash of the UTXO block received with the value interrogated from all peers. So the rogue miner would also need a large set of Sybil nodes. Perhaps infeasibly large because a node in resync could always wait until all historical blocks are received, before transacting, if there is a discrepancy between peers.

Peter Gregory Jr. · Jul 10, 2016

@solex

> presumably aims to be more than a nuisance, and wants to try to profit.

Well, if it is implemented only in BU, I clearly see how any Core supporter can do this, and come publicly blaming what a "lame" implementation other clients have and there is only one true version. Not necessary profit in the way you normally think of an attack.

> All the nodes that are up-to-date will know the correct hash

Well, if in the end you rely on the network and what other connected nodes to you communicate - then I don't see any relevance of including the UTXO hash on the main blockchain. You just have a network of nodes, who follow algorithm, e.g. calculate hash of UTXO set at particular height in the past and store it in parallel in order to send to new nodes. The first thing a new node does is asking what is the last hash from all connected nodes, and then downloads a file and compares the hash of it with what he received from absolute majority. But the whole point of the doc was to make the proof that the UTXO is proper one by covering it with PoW hashing power, and not trusting nodes connected to you.

solex · Jul 11, 2016

OK. What we are looking at is to see whether the idea could be implemented in a weaker form so that some benefit is derived sooner than later, because later depends upon Core and based on their track record they will first reject new ideas as NIH, then produce their own rebadged reworked version.

awemany · Jul 11, 2016

@solex:

Agreed. I think the we want to demonstrate how and that it could be done first. I should start to make some nice graphics on that for a medium article or so...

So I think just calculating the UTXO hash and just having a BU node regularly create OP_RETURNs with the hash would be good enough to show that, yes, it can be done in real time, works, and maybe even can be used to bootstrap (trusted for now) nodes. I still fully agree with you, @Peter Gregory Jr. , as I did on reddit, that enforcing this eventually is the path forward.

@Peter R:

I surely hope that the miners are willing to enforce this eventually. If >50% is agreeing on a certain proposal, that will be law in Bitcoin space, and can be relied upon eventually.

As it will make operation of a full node easier, I expect the incentives of the miners to align with implementing and enforcing such a standard. (Ok, we're not sure that miners still fully follow their incentives and are worried about that in context of their actions on the MBSL, but we still have to believe that, eventually, they'll all do what's right for the chain)

In case the thing is enforced in consensus code, the majority of hash power would need to agree to fake a UTXO set and it would need to convince the full nodes that are running that this new UTXO set is right. Because they'll check that hash and reject blocks that don't agree, I think we're back to the simple old >50% honest HP equation and everything would work out.

Peter Gregory Jr. · Jul 11, 2016

I posted it on reddit bitcoin sub (https://www.reddit.com/r/Bitcoin/comments/4s9wfp/new_bitcoin_nodes_can_be_initiated_within_1_hour/) and as expected Greg came up with rejecting the idea at all. However, I'm glad that he was not able to come up with any single reasonable argument against there, so it looks like idea is really working.

Can you guys advice - as far as I understand there is no sense to waste time and propose it via BIP repo because LukeJr highly likely won't accept it there based on Greg's feedback, right?

awemany · Jul 11, 2016

Can you guys advice - as far as I understand there is no sense to waste time and propose it via BIP repo because LukeJr highly likely won't accept it there based on Greg's feedback, right?

That is my impression, yes. But of course, you could always try. We could try to call it a BUIP/BIP combination, and see how well it goes.

And thanks for the link, reading it now!

awemany · Jul 11, 2016

A couple minor things. You say:

Its not a rehash, I don't find any description there of including it into main blockchain, which basically increases security dramatically, rather than just requesting info from other nodes.

Gmax is right one that one. He said, here:

What if the coinbase TXN included the merkle root for a tree over all open transactions, and this was required by the network to be accurate if it is provided.

So it is indeed an older idea.

I just think there is no reason that it hasn't been implemented yesterday if they're so worried about scalability.

And I am also thinking that Peter Todd's current TXO MMR idea - I haven't checked the details yet - is in principle a good one, at least the general direction is.

Regarding performance doing it only every 4K blocks: I do think that you are right that this will work and reduce the load considerably - but we need to make sure that this isn't wishful thinking and the reduction factor isn't eaten up again by the 4096 x larger differential UTXO set.

Peter R · Jul 11, 2016

I did some more digging (and had a nice conversation with Bob McElrath about the idea) and I realized that we're only looking at one of the benefits of UTXO commitments. We've been looking at their benefits for bootstrapping new nodes with the UTXO set. However, there is a second important benefit of UTXO commitments: proving that an output is unspent to an SPV client.

Imagine that the UTXO set was hashed into a Merkle tree every block and that the root hash was included in the block header (or coinbase TX). Delivering an SPV client the Merkle path from an unspent output to the root hash is proof that the output has not yet been spent in any confirmed transaction.

Achieving this second benefit is more difficult than achieving the first (node bootstrapping). It (ideally) requires that the Merkle root be update each block (a lot of hashing). What's even worse is that this hashing is in the critical orphaning path: if UTXO commitments like this were enforced it would mean a miner would have to calculate the root hash before he could begin mining on a block (and update that root hash each time he adds a new TX to the block he's working on).

Anyways, it seems like a lot of people are interested in making this work somehow, including Peter Todd and Bram Cohen. But no one has found the silver bullet yet.

Food for thought. For our purposes (bootstrapping) all these problems go away, but it would be awesome if someone could find an elegant solution to the more general problem (up-to-date UTXO commitments in each and every block).

solex said:
@Peter R
True, but computing the hash of ~2GB must be a hefty overhead, so nicer to only do it rarely, like every 4096 blocks.

Yes, good point.

nmag · Jul 13, 2016

Watching Greg Maxwell rejecting an idea, which clearly adds to the improvement of the ecosystem is such a joy

Here is a thing that needs to be taken care of: Some wallets or applications are incompatible with pruned nodes (e.g. Armory). It would be wise to have a different folder structure for a "lite" blockchain data, which doesn't break the current structure so that people can experiment with both modes.

Apart from the issue above, it is a great and very useful idea and here's why:

- a lite node is clearly not an SPV node and it is much more secure. By having the UTXO block commitment (let's forget for the moment the "rolling" commiments) for a specific height you can verify with a merkle proof that these transactions have appeared previously in the strongest chain and confirm that the sum of these outputs adds to the total number of coins that should exist up to that block. For an attacker to be able to fake this, he must be able to produce more chainwork from the first block that he faked untill the current block height of the valid chain; simply impossible, if 6 blocks are enough, then 1 month of blocks is more than enough...

- we can still get the benefits of lite mode even without a consensus rule. Just add some trusted BU developers, which sign the UTXO commitments. The security of the UTXO set is again perfectly guaranteed. Then just consider the marketing benefit that BU will have over other implementations: "Get all the benefits of the secure bitcoin network with non of the hassle".

- a full validation of the blockchain can be run in the background. The node can bootstrap and be usable very fast, while a full validation is running in the background with "real time" pruning. It is still a big benefit in terms of hard disk space, unfortunatelly with no CPU benefit this time.

Thank you Peter for this very nice work. I think it is a great idea and is a very important second step for BU (the first was thin blocks) to get some more market share on the network. Consider the market in 1 or 2 years, with a much bigger blockchain and especially annoying bootstrap time...

nmag · Oct 23, 2016

I saw the port on Reddit about what the community would like to see developed in BU. There UTXO commitments where also mentioned, which I believe is a great idea for reducing minimum requirements for a full node.

However, UTXO commitments will never be accepted from Core as a protocol level change, but there might be a workaround. Instead of having the UTXO commitments embedded in the blockchain, let's have trusted BU developers sign UTXO commitments, which are propagated in the p2p network. Then any client has the choice to accept the commitments or not... Yes, I know this creates a central point of failure, but with minimal risk involved. There is always the fall back option of full blockchain validation, a multiparty signature can mitigate the risk of developers' accounts been compromised and as a result we would enjoy a growth in full nodes powered by BU.

After all, there is already a similar central point of failure embedded in the Core code, called checkpoints... signed UTXO commitments introduce the same security risk, which is almost zero.

I find it a win-win situation, bitcoin protocol doesn't change at all and BU nodes get reduced resource requirements.

adamstgbit · Nov 24, 2016

IMO the ”UTXO block” doesn't need to be signed, and it doesn't need to be included on the blockchain...

all you need is to hardcode it into BU builds, as a " start point. "

basically all you want to prove is that the UTXO block embedded, is correctly representing the UTXO had the node actuly downloaded the full TX history up to that point.

any 3rd party with the full TX history can compute its own UTXO hash it and compare that to the UTXO BU hardcodes.

its like a code review, it doesn't need to be systematic, only needs to be done once by a few independent parties.

i have 0 problem using a UTXO block to bootstrap my node if i know that this UTXO has be independently verified.

why do we trust that bitcoinCoreV0.13.1 isnt sending our private keys to Mr Maxwell?
because the code has been peer reviewed. thats it.

By the same token one could say:

why do we trust that BU hardcores a legit UTXO block.
because the code has been peer reviewed...

awemany · Nov 25, 2016

Peter R said:
I did some more digging (and had a nice conversation with Bob McElrath about the idea) and I realized that we're only looking at one of the benefits of UTXO commitments. We've been looking at their benefits for bootstrapping new nodes with the UTXO set. However, there is a second important benefit of UTXO commitments: proving that an output is unspent to an SPV client.

Imagine that the UTXO set was hashed into a Merkle tree every block and that the root hash was included in the block header (or coinbase TX). Delivering an SPV client the Merkle path from an unspent output to the root hash is proof that the output has not yet been spent in any confirmed transaction.

Achieving this second benefit is more difficult than achieving the first (node bootstrapping). It (ideally) requires that the Merkle root be update each block (a lot of hashing). What's even worse is that this hashing is in the critical orphaning path: if UTXO commitments like this were enforced it would mean a miner would have to calculate the root hash before he could begin mining on a block (and update that root hash each time he adds a new TX to the block he's working on).

Anyways, it seems like a lot of people are interested in making this work somehow, including Peter Todd and Bram Cohen. But no one has found the silver bullet yet.

Food for thought. For our purposes (bootstrapping) all these problems go away, but it would be awesome if someone could find an elegant solution to the more general problem (up-to-date UTXO commitments in each and every block).

Yes, good point.

I think one solution is to have a TXO commitment that is trailing the chain tip by an amount that is enough to not having to recalculate live merkle trees all the time.

I think Cohen/Todd said as much on the ML somewhere and for once I think I agree with them on the general gist of the plan.

However, this still doesn't solve the SPV problem unless SPV clients wait a long time until spending coins.

@adamstgbit :

IMO the ”UTXO block” doesn't need to be signed, and it doesn't need to be included on the blockchain...

all you need is to hardcode it into BU builds, as a " start point. "

basically all you want to prove is that the UTXO block embedded, is correctly representing the UTXO had the node actuly downloaded the full TX history up to that point.

any 3rd party with the full TX history can compute its own UTXO hash it and compare that to the UTXO BU hardcodes.

But if you think about it, you could do amazing things with a blockchain that has most nodes operating on UTXO sets:

- storage requirements go down by a lot because if the nodes enforce this, it would be enough to go by the UTXO set and maybe 1 year of transaction data to bootstrap a node.

- the set view itself could be partial: Old coin outputs are coalesced and just the merkle root of 'old coins' is stored (with some TBD parameters). If someone wants to spend them, s/he'd need to supply the full merkle branch to their transaction, but that would just be O(log2(size(old-UTXO-set)).
This would further reduce storage/memory requirements of full nodes to O(transaction-rate) which, for the whole planet using Bitcoin, would probably be like O(1).

It is a conceptually simple improvement and I am still wondering why Core didn't put more energy into this, saying they are supposedly so concerned about Bitcoin's scalability.

adamstgbit · Nov 25, 2016

@awemany
alot of what you say goes right over my head....
not sure why a node with need to download a full 1 year of transaction data.

i'm saying, can we get the "UTXO super pruned TX history database" thingy to be hardcoded into the BU client, maybe this would make the BU exe like 100MB big, but it would mean your node starts off and doesn't have to download 8 years of data, only 3 weeks because the built is 3 weeks old.
is this possible?

core isn't about scaling bitcoin, core is about scaling Blockstream.

whats BU's excuse for not doing this!?

Fuck if i could simply download 100MB and BOOM i have a full node thats reporting fully synced within 20 mins i'd be MOFO gratefull

awemany · Nov 30, 2016

@adamstgbit:

alot of what you say goes right over my head....
not sure why a node with need to download a full 1 year of transaction data.

Assume UTXO commitments are a mandatory ('consensus critical') part of the blockchain. The scenarios I have in mind is as follows:

a) You just download the latest UTXO set from somewhere, and then you look into the blockheaders that you also receive from somewhere and you see they match so you assume they are correct.

b) You download the UTXO set from a year back, plus all the transactions since last year, and the blockheaders.

In scenario a) a single miner could, if he could fully control your view of the network, fool you into accepting an invalid UTXO by mining just one fake block on top of the current chain and just giving you that, with whatever UTXO set he has in mid.

He can't fool you about inflation, because the UTXO merkle tree could contain partial sums of all amounts below each node and the topmost node could be checked against the inflation schedule.

In scenario b), the miner would in addition need to produce a full year of UTXOs, including proper fake downward difficulty adjustments (and that makes it suspicious!) so that he can actually mine all those blocks. Assuming difficulty graphs and periodic block headers hashes are something widely disseminated (something you could get through a second channel), this would make the attack proportionally harder. You could manually check those as well.

In this scenario, I can only see the whole network conspiring to lie to you for a full year for you to accept the fake UTXO set.

As I expect -if Bitcoin retains value- a lively Bitcoin network with full nodes entering and exiting on various occasions (with overlapping timespans), this seems very difficulty to pull off, even if all nodes would just work on partial, 1-year history.

(And again, he can't create inflation with a properly constructed UTXO tree, either)

Does that make sense?

adamstgbit · Dec 1, 2016

@awemany

In scenario a) a single miner could, if he could fully control your view of the network, fool you

i'm suggesting the initial UTXO is hardcored into the build.

in that case, in much the same way you trust BU does not email your private keys to ?Mr. BU? you can trust the UTXO is valid.

Blockchain size reduction for new nodes

New Member

Well-Known Member

Moderator

Well-Known Member

New Member

Moderator

New Member

Moderator

Well-Known Member

New Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

New Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member