Xthin is Robust to Hash Collisions
We have been thrilled by the popularity of Bitcoin Unlimited's article series on Xtreme Thin Blocks (Part 1, Part 2, Part 3, Part 4). To date, the articles have received over 14,700 unique views and over 5,500 reads. A byproduct of this excitement was confusion in the social media about the way the Xthin protocol deals with collisions of the short-hashes it uses to create its thin blocks. The purpose of this post is to explain that the protocol has always dealt with hash collisions in a safe way and that the purported "attack" described by Gregory Maxwell is only a minor nuisance that would neither hurt Bitcoin Unlimited nodes nor give any meaningful advantage to the perpetrator.
TL/DR: Optimistic Mining completely breaks Maxwell's 'attack.' But even without that, the other arguments in this post leave it busted and bruised.
On behalf of the Bitcoin Unlimited Research & Development Team, I invite Mr. Maxwell to make his "Xthin Collision Tool" public [1,2,3]. Both theory and empirical testing show that the Xthin protocol is robust to such hash collisions.
Hash Collisions and the Birthday Paradox
Currently, Xthin builds thin blocks using the first 8 bytes (64-bits) of each transaction ID. In the future, we intend to dynamically reduce this by using feedback to target a certain collision rate. But let's perform our analysis assuming 64-bit hashes.
The probability that any two transactions share the same first 64 bits is 1 / 2^64, or 1 part in about 18,000,000,000,000,000,000. To get a feel for the size of this number, it is approximately equal to the total number of grains of sand on earth, according to this estimate from the University of Hawaii.
Despite this enormous improbability, it is actually feasible (although difficult) to find two bitcoin transactions that share the same first 64-bits using only a consumer-grade computer, custom software, and some patience. What makes it feasible is related to something known as the Birthday Paradox. An example of this paradox is illustrated in the figure below: in Mathematica I generated 1,000 (10^3) numbers between 1 and 1,000,000 (10^6) at random and found that two of those numbers were the same. What are the odds of that!? Well if you repeat this experiment, you'll find a "collision" like this about 40% of the time. The fact that the odds are greater than what you might have expected is the reason it's referred to as a paradox.
The first part of the purported attack is to put your computer to work finding pairs of colliding double-spent transactions. To find a pair, you don't have to generate 2^64 transactions, but only about 10 billion (2^32/0.4) instead, as per the birthday paradox. 10 billion transactions consumes ~3 TB of disk space; producing 10 billion signed transactions and sorting through them to find a colliding pair--although not fast--is not impossibly-slow either. Indeed, finding a pair of transactions with the same first 64 bits is feasible.
So what is the purported attack?
The attack described by Maxwell imagines that a miner would generate a bunch of double spent colliding transaction pairs and continually propagate them across the network. Other miners would include up to one transaction from each pair in their blocks (they are valid, after all) and then--when their blocks are propagated outwards across the network--Bitcoin Unlimited nodes with the other half of the double-spent pair in mempool would encounter two transactions with the same short hash and wouldn't immediately know which one to pick.
But that's not easy
Even with the colliding transactions in hand, it is still difficult to pull off because you can't easily control which miners/nodes have which halves of the pairs. The attacker doesn't have a "God's-eye view" of the network.
But let's assume the attacker does have a God's-eye view and can successfully pull of this attack. It still doesn't hurt a Bitcoin Unlimited node. The node simply requests a "thicker" thin block built from longer hashes. Our testing described in Part 2 of our Xthin article series showed that an extra round trip adds another 0.6 sec to the mean propagation time for a Bloom filter false positive. An extra round trip due to a hash collision would add a similar amount of time; but let's call it 1 second to be conservative. Therefore, if a BU node encountered a collision, it would take perhaps 1.7 seconds rather than 0.7 seconds to download the block from its peer.
This is not a problem for non-mining nodes. As pointed out in the Cornell position paper, what matters is that non-mining nodes are able to consistently receive blocks within the 600 second block time--an extra second on a few hops does not meaningfully threaten a node's ability to keep up with the blockchain.
[continued below]
We have been thrilled by the popularity of Bitcoin Unlimited's article series on Xtreme Thin Blocks (Part 1, Part 2, Part 3, Part 4). To date, the articles have received over 14,700 unique views and over 5,500 reads. A byproduct of this excitement was confusion in the social media about the way the Xthin protocol deals with collisions of the short-hashes it uses to create its thin blocks. The purpose of this post is to explain that the protocol has always dealt with hash collisions in a safe way and that the purported "attack" described by Gregory Maxwell is only a minor nuisance that would neither hurt Bitcoin Unlimited nodes nor give any meaningful advantage to the perpetrator.
TL/DR: Optimistic Mining completely breaks Maxwell's 'attack.' But even without that, the other arguments in this post leave it busted and bruised.
On behalf of the Bitcoin Unlimited Research & Development Team, I invite Mr. Maxwell to make his "Xthin Collision Tool" public [1,2,3]. Both theory and empirical testing show that the Xthin protocol is robust to such hash collisions.
Hash Collisions and the Birthday Paradox
Currently, Xthin builds thin blocks using the first 8 bytes (64-bits) of each transaction ID. In the future, we intend to dynamically reduce this by using feedback to target a certain collision rate. But let's perform our analysis assuming 64-bit hashes.

The probability that any two transactions share the same first 64 bits is 1 / 2^64, or 1 part in about 18,000,000,000,000,000,000. To get a feel for the size of this number, it is approximately equal to the total number of grains of sand on earth, according to this estimate from the University of Hawaii.
Despite this enormous improbability, it is actually feasible (although difficult) to find two bitcoin transactions that share the same first 64-bits using only a consumer-grade computer, custom software, and some patience. What makes it feasible is related to something known as the Birthday Paradox. An example of this paradox is illustrated in the figure below: in Mathematica I generated 1,000 (10^3) numbers between 1 and 1,000,000 (10^6) at random and found that two of those numbers were the same. What are the odds of that!? Well if you repeat this experiment, you'll find a "collision" like this about 40% of the time. The fact that the odds are greater than what you might have expected is the reason it's referred to as a paradox.

The first part of the purported attack is to put your computer to work finding pairs of colliding double-spent transactions. To find a pair, you don't have to generate 2^64 transactions, but only about 10 billion (2^32/0.4) instead, as per the birthday paradox. 10 billion transactions consumes ~3 TB of disk space; producing 10 billion signed transactions and sorting through them to find a colliding pair--although not fast--is not impossibly-slow either. Indeed, finding a pair of transactions with the same first 64 bits is feasible.
So what is the purported attack?
The attack described by Maxwell imagines that a miner would generate a bunch of double spent colliding transaction pairs and continually propagate them across the network. Other miners would include up to one transaction from each pair in their blocks (they are valid, after all) and then--when their blocks are propagated outwards across the network--Bitcoin Unlimited nodes with the other half of the double-spent pair in mempool would encounter two transactions with the same short hash and wouldn't immediately know which one to pick.
But that's not easy
Even with the colliding transactions in hand, it is still difficult to pull off because you can't easily control which miners/nodes have which halves of the pairs. The attacker doesn't have a "God's-eye view" of the network.
But let's assume the attacker does have a God's-eye view and can successfully pull of this attack. It still doesn't hurt a Bitcoin Unlimited node. The node simply requests a "thicker" thin block built from longer hashes. Our testing described in Part 2 of our Xthin article series showed that an extra round trip adds another 0.6 sec to the mean propagation time for a Bloom filter false positive. An extra round trip due to a hash collision would add a similar amount of time; but let's call it 1 second to be conservative. Therefore, if a BU node encountered a collision, it would take perhaps 1.7 seconds rather than 0.7 seconds to download the block from its peer.
This is not a problem for non-mining nodes. As pointed out in the Cornell position paper, what matters is that non-mining nodes are able to consistently receive blocks within the 600 second block time--an extra second on a few hops does not meaningfully threaten a node's ability to keep up with the blockchain.
[continued below]
Last edited: