Fraud Proofs

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
@Zangelbert Bingledack Yes, it's easy to prove that something is included in a block by providing a Merkle proof. That part is trivial. What is more difficult is proving that something is missing.

One way is to simply look at all the data, search exhaustively, and if the thing in question is not there, that proves its absence. This, in essence, is now full nodes can check for "input does not exist" fraud currently.

Another way is to enforce some kind of structure to the data that makes proving absent data easier. For example, in @Justus Ranvier's fraud proof proposal you make a rule that all transactions in a block have to be ordered by TXID, then showing two adjacent transactions, one with higher TXID and one with lower is enough to prove that the TXID in question is absent from the Merkle tree in that block. Of course, in such a setup, you also have to be able to prove if the rule for data structuring has been broken (in this example showing two TXIDs not in order would prove that the rule had been broken).

Anyway, getting back to the /u/tl121 comment, I think he is incorrect that two Merkle roots are needed to solve malleability. You just need to separate the concepts of "TXID" and Transaction Merkle leaf (let's call this "TXML"). The TXID just needs to be a unique way of identifying the transaction, but it does not need to be the Merkle leaf for a transaction in a block.

The problem I have with the Segregated Witness approach to solving malleability is that it conflates TXID with the TXML. These two identifiers serve two distinct functions:
- TXID is used by to identify the transaction in inputs of other transactions
- TXML is used to prove that a transaction has been mined in a block (thus preventing double spends)

The way Segregated Witness calculates TXID is fine, but there is no reason that TXID has to be committed as the Merkle leaf proving transaction inclusion in a block. It would make more sense to commit all transaction data, including input scripts, as a Merkle leaf to prove it is in a block. This TXML would be malleable, but that is fine as long as it isn't used as the identifier for transactions in inputs of other transactions. Non-malleable TXID would be used for this purpose. All you would need to do is re-define the "previous TXID" field in the list of transaction inputs to use the non-malleable TXID, not the Transaction Merkle Leaf.
 

Zangelbert Bingledack

Well-Known Member
Aug 29, 2015
1,485
5,585
What about attestations of validity?

1. Download 1000 known public signing keys from 1000 key institutions (Bitstamp, Circle, government of Switzerland, Microsoft, Coinbase, Wikileaks, ACLU, Bank of Japan, FEE, /r/btc, ViaBTC mining pool, Edward Snowden himself, Tesla Motors, Coindesk.com, McDonald's, EFF, Citibank, etc.) in advance.

2. It becomes common practice for every institution to attest to certain facts about the latest blocks every second of the day with a new message automatically signed with their institutional keys on the website and through an API.

3. As an SPV node, if any single one of those messages claim any oddity, disagree on the state of the chain, or are not renewing every second with proper signing, you are alerted that you could be subject to a man-in-the-middle attack.

Essentially this is hybrid PoW+"ask a friend" but it crucially seems to relax the network requirement from "you must run a full node" to the much looser situation of "there must be plenty of nodes," which is what the big block position argues for.

As a practical matter, barring coders with a lot of time and ability, everyone already trusts Github+bitcoin.org (or +bitcoinunlimited.info) not to both be compromised when they download their full node client. An additional check on reddit and BCT and here and Slack, and then waiting a few weeks in case news of a compromise comes out, is the best protection anyone is realistically going to get even if they run the holy grail of verification: a full Bitcoin node.

If there is an attack scenario still possible here, I'd like to hear it. It seems like you'd have to infiltrate every one of those institutions and people, and do it all at once. If sufficiently distributed geographically, jurisdictionally, and culturally, that seems well nigh impossible.

Now it's worth noting why such a system alone wouldn't work: everyone would only have everyone else to verify the system with, and there would be no final word to anchor it. Again this is what "plenty of nodes" accomplishes. It seems there is no need for everyone to run a node, even if we don't have full cryptographic fraud proofs.
 

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
@Zangelbert Bingledack I think you've just re-discovered proof of stake. That can work - it just provides a different set of security properties compared to proof of work.

If you recall from the Bitcoin whitepaper, one of the key properties of Bitcoin which Satoshi pointed out was the ability of a node to validate what happened during a time period while it was offline.

If you want that property, then you're stuck with proof of work.

If you change the network guarantees such that it's acceptable to require a node to have 100% uptime in order to achieve 100% validation capability then there are other algorithms that can get the job done.
 
  • Like
Reactions: freetrader

Zangelbert Bingledack

Well-Known Member
Aug 29, 2015
1,485
5,585
@Justus Ranvier

I take your point that this is a PoS-like add-on (albeit entirely offchain) being made to partly bridge the gap left by SPV nodes not having compact fraud proofs, but doesn't that at least have a mitigating effect on the situation? It seems to me that as long as you have a huge amount of PoW and sufficiently distributed full nodes as an anchor, PoS-type mechanisms on top of that can provide enough security to make sense for certain use cases (I think this is effectively being done now in a much riskier way, when someone checks blockchain.info to see if their incoming tx confirmed). Without the anchor, the whole thing becomes circular and can come crashing down due to systemic vulnerability, but is there no middle ground to take advantage of?
 

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
Hi @Justus Ranvier
That's correct. The easiest way is to concatenate the txid (does not include input scripts) and the hash of the complete transaction (including input scripts) and hash that to form the leaf node.
I just want to probe this idea a but further. Let's consider some variations of your proposed construction of the Merkle leaf node:

1. What if you concatenate txid (hash of transaction excluding input scripts) with the full transaction data (not hashed), then hash that to form the leaf node?

In this construction, if a full node is serving a light client, it is impossible to provide a Merkle path proof that a transaction is in the block chain while simultaneously withholding input script data. The attack I described above is impossible with this construction. And we have our non-malleable txid.

Let's go a step further. If we examine this construction, we realize that the txid, which is concatenated to the transaction data, is redundant information. If a node (or light client) has the full transaction data, they can calculate a non-malleable txid by hashing the transaction excluding input scripts. So it is unnecessary to included the concatenated txid when forming the Merkle leaf.

So another construction can be:

2. Merkle leaf is simply the hash of all transaction data. The block Merkle tree would appear exactly the same as currently. However, the node would also calculate a non-malleable txid by hashing transaction data excluding input scripts. The software would recognize this non-malleable txid when the transaction is identified in input scripts of transactions spending its outputs. But the non-malleable txid would not directly be part block's Merkle tree.

I think that txid and transaction Merkle leaf node are two distinct concepts, and serve two different purposes. So it makes sense to disentangle the concepts, and treat them differently in the software.

I am focused Merkle tree data structure of blocks because I think this is fundamental to the capability of proving things about the data in blocks, including the types of compact fraud proofs that are possible.

Thoughts?
 

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
@Mengerian Yes, you could leave the txid out of the tree structure because it can be calculated from the transaction data. In some situations it's a useful optimization to have it as part of the Merkle structure though.

Suppose you added a requirement that transactions must be ordered in the tree by their txid in order to make non-existence proofs compact. You don't need the txid to be part of the merkle structure to prove violations of this rule, but if it's not part of it then you need the full transaction data of two transactions in the proof.

Right now that's not a big deal, but if transaction size limits come off then it could become significant in pathological cases.
 

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
Am I correct in thinking that with Segregated Witness, the proof that transaction data is included in a block requires not one, not two, but three Merkle proofs?
1. Merkle proof that the regular transaction is in the block
2. Merkle proof pointing to coinbase transaction containing Merkle root of witness Merkle tree
3. Merkle proof from Witness Merkle root to witness data.

So if a light node wants proof full transaction data (including witness) is in a block, the proof would be significantly larger than currently.
Suppose you added a requirement that transactions must be ordered in the tree by their txid in order to make non-existence proofs compact. You don't need the txid to be part of the merkle structure to prove violations of this rule, but if it's not part of it then you need the full transaction data of two transactions in the proof.

Right now that's not a big deal, but if transaction size limits come off then it could become significant in pathological cases.
That makes sense, I hadn't considered the large transaction pathological case.

On the other hand, I like the idea of designing the "common" use case to make verification of security guarantees easy and part of normal operation. In this case, I am thinking of light nodes processing full transaction data during normal operation, which makes verification of input scripts easy, instead of having to request input scripts separately (especially if it means having to process three Merkle proofs).

I am going to try to write up some kind of "design goal" statement for compact fraud proofs, maybe with a definition of the adversarial case. I think that would help clarify the design considerations. Do you know if such a thing already exists?
 

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
I am going to try to write up some kind of "design goal" statement for compact fraud proofs, maybe with a definition of the adversarial case. I think that would help clarify the design considerations. Do you know if such a thing already exists?
That's what I was going for in the gist you linked in the OP. I do not know of any other example.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I'm a new-comer to fraud proofs. I wanted to say that reading @Justus Ranvier's write-up here is an excellent way start to understand the problem. Thanks for putting that together, Justus.

Am I correct that the biggest challenge is proving the non-existance of a given output?

But does this really matter in practice? If my SPV node receives an alert that an invalid block was mined, due to a TX that spends a non-existant output, and if none of the full nodes I'm connected to can prove otherwise (that the output does in fact exist [and no node can prove that it's been spent]), doesn't it make sense to put that block on "probation" and just wait until the honest majority takes back control of the network?

(I suppose UTXO committments--even if they were "laggy"--would make it much easier to prove non-existance on an output.)
 
  • Like
Reactions: freetrader

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
But does this really matter in practice? If my SPV node receives an alert that an invalid block was mined, due to a TX that spends a non-existant output, and if none of the full nodes I'm connected to can prove otherwise (that the output does in fact exist [and no node can prove that it's been spent]), doesn't it make sense to put that block on "probation" and just wait until the honest majority takes back control of the network?
An invalid block notice that does not contain a proof of invalidity is a DoS attack vector, and potential a way for one miner to interfere with the propagation of competing blocks.
 
  • Like
Reactions: freetrader

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
I have been pondering a good definition for fraud proofs. I am not sure if my ideas are fully formed, but it seems like a good time to share what I have been thinking of.

In the gist by @Justus Ranvier I think the key phrase defining fraud proofs is:
Fraud proofs are a technique which provide honest full nodes the capability to conclusively demonstrate that chain is invalid regardless of the amount of proof of work backing the invalid chain
To clarify why we would want fraud proofs, I also think some kind of statement of the motivation is useful, something like:

"The purpose of fraud proofs is to shift the balance of trust for light clients from having to trust a majority of hash power to be honest, to having to trust that only one full node will be honest"

I think these are a good start at defining statements, and do a good job at getting the concept across. But I'm not completely satisfied with them, I don't think they are precise enough.

I think the main problem has to do with what qualifies as a "full node". I can imagine scenarios (like the one I described previously) were a colluding hash power majority can withhold data from other full nodes, or exclude them from the network in some manner, in a way that makes them incapable of providing fraud proofs. Even if they can detect that the exclusion is happening, they may not be able to prove it to others.

Here is my try at a definition:

"Fraud proofs provide a mechanism by which IF a group of mining nodes can collude to trick light nodes into accepting an invalid majority proof-of-work chain, THEN it will only take one of the colluding nodes to be able to prove the fraud."

And as a corollary: "In a system of working fraud proofs, a light node, during routine operation, will be able to detect that the system is functioning in a manner that ensures fraud proofs are enabled"

I believe this corollary necessarily follows from the first statement, but I have to flesh out this concept some more. Basically, I think if light nodes need to be given extra data to prove that colluding attackers have changed the data structure in blocks in a way that disables fraud proofs, then the colluding miners can withhold this data from all non-colluding nodes. This means that even if the non-colluding nodes know that they are missing data, they can't prove that to others. So the data needs to be structured in a way that light clients can automatically detect if changes are made that make fraud proofs impossible.

A good example of this is that the current Merkle tree structure makes it so that all light nodes have to process all transaction data when checking that transactions are in a block. This makes it impossible to withhold input scripts without it being detected during routine operation.

In these definitions, I have not defined what is meant by "light client". I think it's possible that there could be different types of light client, some of which function is a way that ensures they can be protected by fraud proofs, and some which cannot. For example, a client which only processes the legacy portion of Segregated Witness transactions, and not the witness portion, would be vulnerable to "invalid input script" fraud in a way that can't be proven with fraud proofs.
 
Last edited:

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
But does this really matter in practice?
doesn't it make sense to put that block on "probation" and just wait until the honest majority takes back control of the network?
The threat model for fraud proofs is that the hash power majority is building a chain which the detecting node would consider invalid for some reason. So while in practice trusting the honest majority will usually work, the purpose of fraud proofs is to be able to rely on a weaker security assumption: that there only needs to be at least one honest node.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
An invalid block notice that does not contain a proof of invalidity is a DoS attack vector, and potential a way for one miner to interfere with the propagation of competing blocks.
Yes, I would agree that it's a DoS vulnerability. Not sure that's a show stopper to making incremental progress on improving SPV security.
Mengerian said:
The threat model for fraud proofs is that the hash power majority is building a chain which the detecting node would consider invalid for some reason. So while in practice trusting the honest majority will usually work, the purpose of fraud proofs is to be able to rely on a weaker security assumption: that there only needs to be at least one honest node.
Right. Let's say we assume my SPV wallet is connected to at least one honest node, and let's also assume that at least one honest node will answer all of my queries.

Imagine my SPV wallet receives an "alert" that the latest block is invalid. The reason claimed is that TX 123 spends non-existant Output 789.

Case 1: the alert is genuine

My SPV wallet pings the nodes it's connected to, asking if anyone can prove that output 789 actually exists. All nodes that respond acknowledge that indeed it doesn't exist. My SPV wallet therefore deems the block as likely invalid, notifies me that a bad block has been received, and ignores the block.

Outcome: works as intended.

Case 2: the alert is false

My SPV wallet pings the nodes it's connected to, asking if anyone can prove that output 789 actually exists.

Node A responds with a Merkle branch proof that output 789 does actually exists. My SPV wallet cross checks this with its local record of block headers, and concludes that output 789 did in fact exist at some point in time. Now my wallet has reason to doubt the alert.

My wallet then queries the nodes it is connected to see if anyone can prove that output 789 has been spent.

No one can prove it. My SPV wallet therefore deems the block as likely valid, bans the node that gave it the false alert, and continue normally.

Outcome: works as intended.

What is wrong with this (other than the potential DoS vector)?
 
Last edited:

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
What is wrong with this (other than the potential DoS vector)?
The DoS vector is the major problem. The number of bytes the honest nodes need to exchange to determine an alert is fake is more than the size of the alert message, which creates amplification.
 

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
Another way of describing the objection is that, in the scheme described, you wouldn't gain anything compared to running a full node. You would either have to somehow trust the node supplying the "fraud alert", which introduces the problem of a trusted third party, or you have to check every random "fraud alert" which could easily end up being as resource intensive (or more) as just running a full node in the first place.

Another consideration is that rejecting the longest proof-of-work chain is a risky thing for a node. With fraud proofs, you can have solid proof that fraud has occurred, so you can confidently reject that chain. You can also credibly threaten, before the fact, that you will reject a fraudulent chain, thus discouraging it from being created in the first place.

In the scheme described, you can end up in a sort of limbo state where you can't prove fraud, but you can't prove validity either. This makes it less certain that the node should reject the longest proof-of-work chain, and less effective as deterrent to producing an invalid chain.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I agree with what you wrote and that it would be better if the SPV wallet didn't have to trust that at least one node it was connected to was honest. But aren't we already making that security assumption anyways? You wrote earlier:

"the purpose of fraud proofs is to be able to rely on a weaker security assumption: that there only needs to be at least one honest node."

So I'm still not convinced that from a practical perspective it really matters.

I haven't given this as much thought as you or Justus, but I'm not convinced that the DoS vector is that much of a problem either. If a node produces a fraudulent alert, other nodes won't propagate the alert (because they can see that it's a lie) and those nodes can ban the misbehaving node. If a node serves a fraud alert directly to an SPV wallet, yes, this takes some communication with the other nodes to sort out the issue, but then the SPV wallet would disconnect from the bad node, to reduce the chances of this happening again.
 
Last edited:
  • Like
Reactions: awemany

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
But aren't we already making that security assumption anyways?
Yeah, but the whole point of fraud proofs is to not have to rely on that security assumption.

In practical terms today, yeah, everything works as-is. We have SPV security for light wallets, and it works.

A working fraud proof system would allow future development where you could have nodes that don't have the full block chain, maybe they process only a shard, or prune their data, and can still have pretty much the same security as a full node. It would be an enabling technology for the system to massively scale in a secure and efficient way.
 
  • Like
Reactions: awemany

Mengerian

Moderator
Staff member
Aug 29, 2015
536
2,597
Your example got me thinking @Peter R

I guess I have always imagined fraud proofs in a scenario where nodes could validate sharded data, and be able to produce fraud proofs if they find anything amiss. I wonder if the definitions adequately cover this use case.

For example, you could have a node that probabilistically verifies 1% of transactions. Or a node could just verify transactions that it is interested in (say a large business). Then, if you had many thousands of such nodes, you could have confidence that any violations of validity rules would be caught, even if none of the nodes individually process all transactions. It is even conceivable that miners could avoid processing all transactions, and rely on large network of sharded validators. Such schemes rely on the capability to efficiently prove any fraud to the entire network.