Bloomfilter for every transaction, not just blocks
I got this idea yesterday to optimize normal transaction traffic on a node. It will most likely not work for some fundamental reason that I haven’t understood or a premise I got wrong. It might have been proposed by somebody else before, but I’m not aware of that.
I know the normal traffic on a node is not considered to be a bottle neck issue, but it would still be nice to reduce it.
It’s
100% inspired by Xthin, and I hope
@Peter Tschipper would take a look at it and give a comment, but you can all join in to kill the idea. The sooner it’s killed, the less time wasted on a useless idea.
I am/was a programmer, but I have never looked at the bitcoin code. I have never worked on an open source project, and I will not be able to write the code for this idea.
These are two important premises for the idea to work. One or two of them are probably wrong
1) A node sends the same transaction to several nodes that allready have received it.
2) A node receives the same transaction several times from different nodes.
The idea is this:
A node first send a part of a hash of the transaction (That’s what’s done in Bloomfilters, right? The length of the part of the hash is long enough to make it statistically pretty unique. I havent thought about collisions yet.)
If the receiving node hasn’t seen this transaction yet, it asks for the full transaction and gets it.
And that’s pretty much it.
Please kill the idea, not me guys
@Norway You're on the right track there but a little behind the curve
...it's already been done. Core has already implemented a rolling bloom filter to make sure we don't request the same transactions twice. You get extra points though for coming up with the idea yourself !
I just can't write off this idea yet.
@freetrader just showed me some data from his BU node. In one month, january this year, it received 79.62 GiB of data, and transmitted 43.26 GiB of data.
Let's just look at the received data.
79.62 GiB per month is 18,5 MB per 10 minutes or per block.
Why this much?
I just have to speculate here. I think the real reason is that a node receive the same transaction multiple times from different nodes. What a waste!
I know there is a gossip protocol at work here. I don't know how it works. But it sends the same transaction several times to a node (from different nodes).
I'm not trying to change or reinvent the gossip protocol. But maybe we could change
what the gossip is.
I'll repeat my basic idea here:
A node should never send a transaction. Just a partial hash of that transaction. If the receiving node have never seen this transaction before, it request the full transaction and receives it. That's it.
When a transaction is broadcasted for the first time, this will add both more traffic and propagation time for the first few nodes in the network that have never seen the transaction before because of the extra partial hash and the extra roundtrip. But this will change dramatically as the transaction is becomming known by a growing number of nodes in the network.
I think this could reduce the received data to approx 1/5 of what it is today. From 18.5 MB received per block to approx 3 MB per block. You only receive the full transaction once!
About the length of the partial hash and collisions:
The length of the partial hash should be as short as possible to save space/traffic, yet still give uniqueness among the other transactions. It could be made dynamic, based on statistics of how many transactions are in mempool. But this introduces more complexity.
So I think the length should be maybe 4 or 5 bytes. (OMG, not another magic number!)
With 4 bytes, you get 4,294,967,296 ID's.
With 5 bytes, you get 1,099,511,627,776 ID's.
In a global on-chain coffee cup world where bitcoin is the only currency, 4 bytes would create collisions every day. But a single user will likely never experience a collision in his or her lifetime.
And what's the consequence of a collision? Your transaction is simply not picked up by the network. After the other colliding transaction is out of mempool, just send it again.
Dear
@Peter Tschipper, please take another look at this. I know you have looked at it once, and I don't want to waste your time. But a node is receiving and transmitting a lot more data than needed today.
Cheers!
EDIT: In the example of global bla bla bla and 4 bytes ID, I think people will experience collisions from time to time. Haven't done the statistical calculations on this. Maybe 5 or even 6 bytes are better.