Fraud Proofs

Mengerian · Nov 24, 2016

The purpose of this thread is to discuss and gather information about fraud proofs.

I am still working through and trying to wrap my head around many of these concepts, so I intend this thread to be a sort of work in progress. Hopefully others may find this useful, and can share ideas and information about fraud proofs.

I’m hoping to better understand and develop ideas such as:

What is the purpose of fraud proofs, what are the goals to aim for
Looking at different options for changes to the protocol would be best to enable better fraud proof capabilities
How can they be incentivised, are there business models that can be enabled to provide fraud proof services to light nodes

A primary reference and starting point is @Justus Ranvier’s article on compact fraud proofs: https://gist.github.com/justusranvier/451616fa4697b5f25f60

Also relevant are @deadalnix’s proposals on Merklix Trees:
http://www.deadalnix.me/2016/09/24/introducing-merklix-tree-as-an-unordered-merkle-tree-on-steroid/
http://www.deadalnix.me/2016/09/29/using-merklix-tree-to-checkpoint-an-utxo-set/#more-455
http://www.deadalnix.me/2016/11/06/using-merklix-tree-to-shard-block-validation/
And some discussion in @deadalnix’s BUIP037 proposal thread: https://bitco.in/forum/threads/buip037-hardfork-segwit.1591/

---------------------------

Many of the data structures in Bitcoin revolve around proving things:

The proof of work proves that real-world resources were expended associated with a block header hash
The hash chain of block headers proves that all blocks in the chain are linked with the proof of work of the chain tip
The merkle tree of transactions in a block can be used to prove, in compact way, that a particular transaction exists within that block.
The signature data associated with a transaction proves that the creator of the transaction has control of the necessary private key

All of these data structures are combined to allow users of the system to have confidence in certain properties of the system such that it can be used to enable monetary behavior. The nice thing about Bitcoin is that anyone can verify the integrity of the system, in a permissionless manner, by running a full node.

The data structure of the block chain can also be used to prove things to others who do not have all the blockchain data. For example, the existence of a transaction in the block chain can be proven to exist with only the block headers and a Merkle path pointing to the transaction. SPV clients currently use this method to check that their relevant data is included in the block chain.

Other things are more difficult to prove. For example, proving that a particular piece of data is absent from the block chain is more difficult. If a miner wanted to include a transaction in a block where the input did not exist, the only way to prove this fraud, with the current block chain, would be to scan the entire block chain to to show that the input is absent. SPV wallets rely on the fact that other full nodes are checking the validity of the blocks that are mined to have confidence that this type of fraud is not occurring.

The concept of Compact Fraud Proofs is to be able to prove, as efficiently and possible, all the different types of fraud to clients who do not have a full copy of the block chain. The reason to do this is to create an asymmetric advantage for honesty over fraud.

Currently the security of SPV clients relies on the assumption that the majority of has power is honest. This is a fairly strong security guarantee, a collusion between the majority of miners seems unlikely. But with compact fraud proofs we can do even better. In a world where compact fraud proofs are in use, the light nodes only rely on the assumption that one honest node exists who can provide the compact fraud proofs.

Two things need to be done to move toward a system where Compact Fraud Proofs are practically implemented:

Make changes to the Bitcoin block chain and/or add consensus rules to make all the necessary types of compact fraud proofs possible.
Add mechanisms to enable a market of fraud protection providers and consumers.

Some areas for further investigation:

Different data structures to enable compact fraud proofs (eg Merklix trees)
How to enable a market for fraud protection and compact fraud proofs
How fraud proofs, or similar concepts, can help nodes with data pruning and sharding
The relationship between compact fraud proof capabilities and UTXO. Do we need better fraud proof capabilities before implementing UTXO commitments, or do UTXO commitments provide many similar benefits as Compact Fraud Proofs
Different types of fraud, and how this influences incentives. For example, there is a difference between proving that the individual coins are properly controlled by an individual user of the system, versus proving that the overall system preserves monetary properties such as scarcity.

Mengerian · Nov 25, 2016

Question: Why is the witness data in Segregated Witness put into a Merkle tree committed into the block?

For a transaction to be considered authoritative, two requirements need to be satisfied:
1. It is not double spent. Only one transaction spending particular coins can be authoritative.
2. It has a valid signature. This means that the holder of the coins consented to spend them.

Transactions identifiers (TXID) are included into blocks to prevent double spending. Full nodes validate that conflicting transactions are not mined into the block chain, and the proof of work makes it so that the network converges on one valid chain. A Merkle tree of unique identifiers of transactions (ie, TXID) is built, and the Merkle root put in the block header. This allows one to prove that a certain transaction is "authoritative" by providing the block header and Merkle path to that transaction. Conflicting transactions would be considered non-valid, cannot be included in the block chain as full nodes would reject them.

Witness data is different, however. Here all we need to prove is that at least one valid signature exists. This proves that the holder of the secret key signed the transaction. If other similar signatures exist, whether valid or not, it doesn't matter. There is no need to "de-conflict" signatures. In fact, signatures can be malleated by third parties, so unlimited numbers of valid signatures could be generated for each transaction. Witness data refers to the unique transaction data, so the witness data itself provides a link proving it is associated with a particular transaction. If someone wants to prove that a valid signature exists for a transaction they need to save the witness data, and provide it as proof.

What is gained by building a Merkle tree structure for witness data?

Bloomie · Nov 25, 2016

Thanks, Mengerian.

This thread was flagged as "spam" on r/bitcoin for some reason.

Mengerian · Nov 25, 2016

@Bloomie that's strange

Roy Badami · Nov 26, 2016

I'm not sure if you're being ironic, but I don't find it at all strange. You're questioning the design decisions behind segwit, and therefore by implication you're questioning the narrative that the Core developers are somehow uniquely qualified to maintain this system that no one else in the world has the skills to understand.

In that subreddit pretty much anything that doesn't fit their agenda is banned.

EDIT: And thanks for the interesting analysis - I hadn't thought about it in those terms before.

Mengerian · Dec 6, 2016

@Roy Badami yeah, I guess it shouldn't be surprising. It's funny because I'm not even meaning to criticize segwit here, or the concept of the wtxid Merkle commitment. I am just legitimately trying to work through the logic of how the data structures are designed so that I can better understand the dynamics of Bitcoin as a system. I would actually welcome someone to come along, point out where I’m mistaken or missing something, and help me improve my understanding.

The witness data is one of the most fundamental cryptographic proofs in the whole Bitcoin system. It is the mechanism that ensures that only the valid holder of the coins, who has the private key, can spend them. So it seems, intuitively, that you would want data structures that enable proofs that 1) a transaction has a valid signature, or conversely 2) the transaction does not have a valid signature.

Case 1 is easy, just providing the witness data is proof that the transaction has valid signature. The signature by itself is sufficient to prove that it validly signs particular transaction. You don’t need it any special data structure for this. Internally nodes may want to save witness data in a way that makes lookup easier, but that is a separate issue than whether we need a consensus enforced data structure.

Case 2 is more difficult, I would divide it into two sub-categories: 2a) a node is provided with witness data, but that signature is invalid, and 2b) the node does not have access to witness data.

In case 2a, having the witness data in a Merkle tree is useful. It allows a full node to prove, in a compact way, that a particular instance of witness data exists in the block, and that the witness is not valid. This is a fairly trivial case.

But what about case 2b? Say a majority of miners collude to produce blocks with no witness. They simply put a random number where the witness Merkle root is expected. Could one of the colluding nodes produce a proof of the fraud? (this is the type is asymmetric situation Fraud Proofs are designed for).

An honest node could produce a plausible valid witness Merkle tree, and show that the root is different from that provided by the cartel. But since signatures are malleable, and infinite number of valid roots exist, this proves nothing. Essentially, the problem is that Merkle trees are very good at providing compact proofs that something exists, but proving that something is absent is more difficult.

It seems that the solution to this conundrum is to say that nodes will not consider blocks valid unless they include witness data. But this seems to be circular logic in the context of compact Fraud Proofs. How can we prove that this condition was violated without experiencing it directly?

Mengerian · Dec 18, 2016

Some further thought about Segregated Witness.

With the current block structure, it is impossible to provide a Merkle proof that a transaction has been included in the block chain without also having to provide signature data. This is because the signature is included in the data that is hashed to produce leaf of the transaction Merkle tree.

The transaction hash is also used as the TXID used to point to the transaction outputs when they are spent by other transactions. The problem with this is that the TXID, since it includes signatures, is malleable, meaning that it can be changed by third parties (i.e., not the signer) before it is included in a mined block.

These are two distinct concepts:
1) TXID as a unique pointer to refer to previous outputs in transactions
2) Transaction hash as a Merkle leaf enabling compact proofs that a transaction is included in a block.

Segregated Witness solves the malleability problem by removing the signature data when calculating TXID. This is very good for using in transactions as a unique pointer to previous transaction outputs.

But Segregated Witness also simultaneously changes the Merkle tree structure used to prove that a transaction has been mined into a block. It introduces the possibility the nodes can prove to light clients that a transaction has been mined in a block, without needing to also provide the signature data associated with the transaction. Whether this is a good property or a problem is unclear to me.

I don't see any technical reason why these two distinct concepts couldn't be kept separate. It just means that the TXID used to refer to transactions would be a subset of the transaction hash (TXH) used to build the Merkle tree of transactions in a block. If using @deadalnix's Merklix tree structure for transactions in blocks, the TXID could even be used as an index to order the transactions, making proof of absence possible, with the full transaction data, including signatures, used to build the hash tree.

Justus Ranvier · Dec 18, 2016

Mengerian said:
But Segregated Witness also simultaneously changes the Merkle tree structure used to prove that a transaction has been mined into a block. It introduces the possibility the nodes can prove to light clients that a transaction has been mined in a block, without needing to also provide the signature data associated with the transaction. Whether this is a good property or a problem is unclear to me..

The purpose of a fraud proof is to show that miners have included an invalid transaction in a block.

This is possible as long as the transaction input scripts are part of the data that's hashed to create the block's Merkle root.

Mengerian said:
It just means that the TXID used to refer to transactions would be a subset of the transaction hash (TXH) used to build the Merkle tree of transactions in a block. If using @deadalnix's Merklix tree structure for transactions in blocks, the TXID could even be used as an index to order the transactions, making proof of absence possible, with the full transaction data, including signatures, used to build the hash tree.

That's correct. The easiest way is to concatenate the txid (does not include input scripts) and the hash of the complete transaction (including input scripts) and hash that to form the leaf node.

Mengerian · Dec 18, 2016

Justus Ranvier said:
The purpose of a fraud proof is to show that miners have included an invalid transaction in a block.

This is possible as long as the transaction input scripts are part of the data that's hashed to create the block's Merkle root.

Yes, and this is not the case for Segregated Witness. The signature data is in a separate Merkle tree whose root is included in the coinbase transaction.

So to prove an invalid signature in a Segregated Witness transaction, a separate Merkle proof would need to be produced to prove the existence on the invalid signature in the Witness Merkle tree. And in a scenario where miners are colluding to mine invalid transactions, couldn't they also collude to withhold the witness data?

That's correct. The easiest way is to concatenate the txid (does not include input scripts) and the hash of the complete transaction (including input scripts) and hash that to form the leaf node.

Cool, yes, this is what I had in mind. I think you described it better than I did though.

In this case, if you show that a transaction is in a block, you necessarily have to provide the input scripts to prove that the hash forms the leaf of the Merkle tree. So it would be impossible for colluding miners to withhold signature data and simultaneously prove to light nodes that the corresponding transaction is mined in a block.

Mengerian · Dec 18, 2016

Justus Ranvier said:
That's correct. The easiest way is to concatenate the txid (does not include input scripts) and the hash of the complete transaction (including input scripts) and hash that to form the leaf node.

Actually, now that I read what you wrote more carefully, I think we have slightly different ideas.

My concern with the structure you describe is that malicious nodes could show a Merkle proof pointing to a TXID, but withhold the input scripts (providing only the hash, but not the data used to calculate the hash). This opens to door to the signature withholding attack I described above.

I imagine the data structure of the transaction Merkle tree exactly the same as currently. To fix malleability, you could simply define a TXID as the hash of the transaction, skipping input scripts. A new transaction version would use the new TXID (TXID-new) in place where of the old TXID, but otherwise could be exactly the same. TXID-new would not appear directly in the block data structure, but could be calculated retroactively for all transactions currently in the block chain. The node software would also presumably have to maintain some sort of index of TXID-new to be able to find them quickly.

Justus Ranvier · Dec 18, 2016

@Mengerian I don't see any attack in what you're describing.

A fully validating node is not going to accept blocks it can't fully validate, which means having all the information needed to valid the block. Rearranging where that data is encoded in a block doesn't change that.

A light node that is interpreting proofs provided by a full node has no obligation to accept invalid proofs, so if a full node generates a pseudo-proof that doesn't actually prove anything, it'd be a bug for a light client to accept it.

On a side note, it's more correct to say "script" instead of "signature". "Signature" used to be a synonym of "input script", but only in the very earliest releases of Bitcoin before we even started using addresses.

Mengerian · Dec 18, 2016

This is my attack scenario:

A cartel of miners (>51%) collude to steal coins by creating Segregated Witness transactions without valid input scripts, and mining them into blocks. They spend valid UTXO, just from coins for which they don't control the private key. There is no witness data, they simply put a random number where the witness hash should be.

Light clients are fooled into accepting these transactions. The miner cartel provides Merkle proofs that the transaction are mined into blocks, but don't need to provide the "witness" to light nodes under the Segregated Witness model.

Other full nodes (not part of the cartel) are not given any witness data, thus cannot validate the blocks and do not accept them. They reject the cartel's chain, the longest POW chain. But they also cannot provide fraud proofs to the light clients proving that something is amiss.

If one miner in the cartel defects, they also cannot prove the fraud. Since the witness hash is just a random number, there is no pre-image of invalid data that can be found to prove an invalid input script.

I realize that this may be a far-fetched scenario. The whole point of compact fraud proofs is to shift the balance of trust from having to rely on the majority of miners, to only needing one honest node who could prove the fraud.

The attack I have described makes it impossible for a single honest node to prove the fraud. We are back to no better than the SPV level of trust we have today, or else needing to run a full node to detect the fraud.

Justus Ranvier · Dec 18, 2016

I see what you mean.

This can be generalized to a new type of fraud: the block's merkle tree is invalid.

The general way to solve that is to add P2P network commands for requesting merkle trees associated with particular blocks.

The proof for that type of fraud is less conclusive than for other types. I'd call it more of a "fraud claim" than a "fraud proof". A full node could produce a fraud claim that says "The merkle root for block X can't be substantiated". A light client verify this claim by asking every full node it can connect to for the merkle tree for block X. If no node can produce a valid tree, then the light client should assume the block is invalid.

deadalnix · Dec 19, 2016

I was thinking the same thing, but I'm not happy with it. It is very easy for a node to abuse this mechanism as a DoS vector. However, I don't have anything better to propose.

EDIT: I think I have something. What if the prover put some money on the table in a specific transaction. This transaction needs to only be valid in the current chain, and not in another chain. If someone can provide the proof, that someone can claim the money. On the other hand, if nobody can, then the chain can be rollbacked and the prover recover his money.

I did not really fleshed out the details, but that looks like it could work.

Justus Ranvier · Dec 19, 2016

deadalnix said:
I was thinking the same thing, but I'm not happy with it. It is very easy for a node to abuse this mechanism as a DoS vector.

That's equally true of nearly every P2P network message though.

Mengerian · Dec 20, 2016

Justus Ranvier said:
This can be generalized to a new type of fraud: the block's merkle tree is invalid.

Hmmm. I have to think whether this is really a fundamentally different category. Seems to me that every type of fraud could be described as "Merkle tree is invalid", or at least more generally "block data structure is invalid". I agree with @deadalnix that the "fraud claim" scenario you suggested is unsatisfying.

I think that the scenario I described exposes a flaw with Segregated Witness, and even the general idea of separating the "witness" from the rest of transaction.

I see "fraud" as a contingent thing. We don't care if it is possible to produce an invalid block chain, as long as we can easily detect it is invalid. The problem with fraud is if someone is tricked into believing that they are given valid information, when in fact it is invalid.

For full nodes, with the entire block chain history, fraud is fairly easy to detect. Since they have all the relevant information, they can simply check everything themselves. Or they can detect if information is missing. Where fraud proofs become valuable is for light clients or clients with a sharded or pruned block chain.

What I am realizing is that fraud proofs are not just a feature that can be tacked on. They have to be considered in the design of the fundamental data structures.

You need a data structure where nodes cannot provide proof that a transaction is in the block chain without also providing the data needed to enable fraud detection. In other words, the data needs to be structured such that missing or invalid data can be detected. The data given to light nodes needs to structured such that either 1) fraud can be detected directly, or 2) the capability of other nodes to produce fraud proofs can be detected.

I think the Segregated Witness data structure makes fraud proofs for missing input scripts impossible. This is because it allows allows light clients to be tricked by giving them by simultaneously 1) providing a proof the transaction is in the block chain, and 2) withholding data which could prove fraud (either invalid input scripts or missing input scripts).

I think this means that all the data in a transaction needed for fraud proofs needs to be hashed together directly to produce the Merkle leaf proving its inclusion in a block. Any time you hash the data separately, and allow only a portion of the data to be provided to light nodes, you introduce the possibility for undetectable fraud through data withholding.

Justus Ranvier · Dec 20, 2016

Mengerian said:
Seems to me that every type of fraud could be described as "Merkle tree is invalid", or at least more generally "block data structure is invalid".

A block with invalid leaf nodes and a well-formed merkle root is a lot easier to deal with than a block with a malformed merkle root. Each of those categories require different strategies to prove.[/quote]

Mengerian said:
I think this means that all the data in a transaction needed for fraud proofs needs to be hashed together directly to produce the Merkle leaf proving its inclusion in a block. Any time you hash the data separately, and allow only a portion of the data to be provided to light nodes, you introduce the possibility for undetectable fraud through data withholding.

You haven't solved anything. However you declare a merkle root should be calculated, it's possible for a miner to broadcast a block with an invalid root and you still need a way to demonstrate this so that light clients can know to reject the block.

You only need 32 bytes per transaction in a block to recreate the block's merkle root. This means that a network message for querying the merkle root of an arbitrary block only needs to return about 64K for a block with 2000 transactions in it.

Mengerian · Dec 20, 2016

Justus Ranvier said:
You haven't solved anything. However you declare a merkle root should be calculated, it's possible for a miner to broadcast a block with an invalid root and you still need a way to demonstrate this so that light clients can know to reject the block.

It prevents miners from providing a Merkle proof that a transaction is in the block while simultaneously omitting input scripts, or with malformed input scripts. In other words it enables light nodes to be certain that they will reject transactions that steal coins, even if miners collude to mine such transactions.

Imagine that a group of miners collude to mine blocks with the Merkle root replaced by a random number. What type of fraud would this enable?

(I actually am seriously interested in thinking about this, it's not just a rhetorical question)

I don't see how it would enable them to trick light clients into accepting things like transactions without input scripts, transactions with missing inputs, spending previously spent outputs, etc. These are the types of fraud that need to be prevented, since they destroy monetary properties.

Zangelbert Bingledack · Dec 21, 2016

Mengerian said:
I think the Segregated Witness data structure makes fraud proofs for missing input scripts impossible. This is because it allows allows light clients to be tricked by giving them by simultaneously 1) providing a proof the transaction is in the block chain, and 2) withholding data which could prove fraud (either invalid input scripts or missing input scripts).

I think this means that all the data in a transaction needed for fraud proofs needs to be hashed together directly to produce the Merkle leaf proving its inclusion in a block. Any time you hash the data separately, and allow only a portion of the data to be provided to light nodes, you introduce the possibility for undetectable fraud through data withholding.

Oh gosh, is this yet another possible way Core/BS could be trying to checkmate us with a poison pill? By ensuring that fraud proofs are impossible, thereby ensuring "everyone" must run a node, thereby biasing things toward keeping blocks small to incentivize their offchain scaling products?

If so, the poison pill has both the 4x discount incentivizing BS products and the eternal destruction of fraud proofs as a way to ensure that BS products must be adopted to scale, with the un-reversible nature of the SW soft fork as a way to ensure the poison pill can't be coughed back up again. If this is true, that is downright diabolical.

awemany · Dec 22, 2016

@Zangelbert Bingledack : I think the general angle of 'what could SegWit prevent us from doing in the future' is very worthwhile to further explore.

Fraud Proofs

Moderator

Moderator

Administrator

Moderator

Active Member

Moderator

Moderator

Active Member

Moderator

Moderator

Active Member

Moderator

Active Member

Active Member

Active Member

Moderator

Active Member

Moderator

Well-Known Member

Well-Known Member