BUIP017 (passed): Datastream Compression

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
I've heard this line before: Fast Relay Network is so good, let's not improve anything! What I don't understand is how you can say "small blocks because bandwidth" at the same time. :)
Datastream compression also helping with the alleged 88% which Maxwell says is not block data means even a 20% of that being reduced is comparable to Xthin for reduction of bandwidth overhead. The bonus is that DC helps with both burst and non-burst data.
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@solex it reduces the 88% (the alleged). By how much? it depends on how many historical blocks and tx's are sent/received. I don't know what the numbers total are, it will be a good question to answer as we get to testing this and tracking the actual daily savings. Overall I would say we'll save about 15% to 20% with DC plus another 15 - 20% from xthins but i don't have hard data.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
@Peter Tschipper
Thanks, those are useful estimates.
I was reading the reddit comments about DC and you are doing so much building up goodwill towards BU.
 
  • Like
Reactions: Peter R

Inca

Moderator
Staff member
Aug 28, 2015
517
1,679
Once again this is great stuff. BU and Classic need to start integrating these fantastic optimisations and leave Core behind.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
Hello all,

A reference client for Datastream Compression is available for anybody who wants to compile and try it out. You'll need to
connect to a client that supports compression which can be found on bitnodes and using version 80002.


Reference Client:

you will need "LZO 2.09" to compile this.
https://github.com/ptschip/bitcoin/tree/BUIP017_compress


Bitnodes:

https://bitnodes.21.co/nodes/?q=80002


The client uses LZO compression with two levels of compression. 0 is off, 1 is fast, 2 is maximum compression. Be sure to run the compression.py python test script after you compile to make sure it's working correctly for you.
(./rpc-tests.py compression)


Additional Findings:

1) Most of the compression benefits would come from nodes doing IBD from our node. It's a frequents occurence and is the biggest bandwidth consumer by far, next is transactions and then inv messages. However we don't compress inv's as they are not compressible with LZO however there will be a followup BUIP that will hopefully deal with those in the near future. Also, there was a Core bug which was fixed which was constantly downloading headers from unsync'd nodes which also accounted for a significant amount of bandwidth savings.

2) Bloom filters could not be compressed to any extent using LZO compression. We'll need some kind of Adaptive Range Encoder or Arithmetic Encoding, for which there is no suitable portable open source at this time. It would be an interesting project for someone to produce such a compressor in c/c++ or asm.


Updates to getnetworkinfo:

getnetworkinfo now gives the following information. Xthin % compression over last 24hrs
as well as datastream compressionstats.

Thinblock stats was also updated to include outgoing xthins as well since there is a significant
bandwidth savings from those xthins, which we were not factoring in before.
*Potential compression refers to how much additional compression would be realized had the
other connected peers also been supporting compression.

Sample results from getnetworkinfo:

"thinblockstats": {
"enabled": true,
"summary": "326 thin blocks have saved 214.99MB of bandwidth",
"summary": "Compression (last 24hrs) is: 94.8%"
},
"compressionstats": {
"enabled": true,
"cmp level": 2,
"summary": "Compression has saved 8.95MB of bandwidth",
"summary": "Compression is: 20.1%",
"summary": "Potential Compression could save an additional 354.16MB of bandwidth"
},
 
  • Like
Reactions: Chronos
. Also, there was a Core bug which was fixed which was constantly downloading headers from unsync'd nodes which also accounted for a significant amount of bandwidth savings.
Great! Can you give some numbers?

And am I right that this again only comes into effective if you are connected with other nodes that do this?

I also have a question, maybe it's stupid, but ... is it a compression like the html/css compression or like .zip / .tar? So, does it makes it more difficult for ISPs to see that you are sending and accepting bitcoin-things? And if, is it possible to put a virus in the compressed paket that autoexecutes if you decompress it?
 

Dusty

Active Member
Mar 14, 2016
362
1,172
Another related question: could it be useful to store compressed data on the disk?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@Christoph Bergmann

Yes you have to be connected to another node that supports compression or has compression turned on. It's a config option, so by setting -compression=0, it turns off compression/decompression entirely.

Compression rates are generally 20%, sometimes up to 27% for the larger full blocks.

And yes it is .zip type compression but done with lzo.

I don't think it will make it more difficult for ISP's to see anything, since the message headers are not compressed, only the data portion of the message gets compressed.

Viruses, you could put a virus in anything but it needs to get executed. The only way to do that in Bitcoin is generate a buffer overflow. LZO has been around for 20 years and there are no known overflow errors in LZO. There was an integer overflow found a couple of years back which was patched but it could never be used for anything other than causing an application to lock up.
[doublepost=1459437611][/doublepost]
Another related question: could it be useful to store compressed data on the disk?
I believe the UTXO is already compressed using Snappy compression, but blocks I don't think so and anyway you can already compress your blocks by created a compressed disk or compressed folder using your OS.
 
  • Like
Reactions: sickpig

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
does this work with Thinblocks?
It seems to me compressing block and thinblock would be mutually exclusive, and since thinblock achieve a avg of 90% reduction... this compression idea seems like a moo point.

also i dont understand how you can actually get 20% compression on a block
arnt block simply full of <500byte TX which are bacily perfectly random data which cannot be compressed?

say a thinblock is being sent from one peer to the other and they need to send 100 <500 Byte TX, you're telling me concatenating and compressing these TX's would yield ~7.83% compression? really!?

i think 7-20% bandwith savings is huge, if it can be married with thinblocks then this is very worthwhile.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@adamstgbit The short answer is yes , it does work with thinblocks, in some cases. But if an xthin is truly thin, meaning it only has 1 or 2 tx's added to it, then there is very little to compress since most of the xthin is just tx hashes which can't be compressed very well. However, xthins are not always thin. In two cases, 1) when the node first starts up, xthins can be almost as large as a full block, and 2) sometimes even during normal times and when the system is getting a lot of these large spam type tx's and over the weekend, we tend to get these larger xthins that are only 30 to 50% compressed. In those cases DC helps to compress the xthin further.

Second question is that you are correct. Tx's <500 bytes don't compress well or at all and unfortunately most tx's are <500 bytes. So what we do in DC is concatenate tx's together by first concatenating the inv/getdata requests and then concatenating all the tx's in our inv queue before compressing them. This yeilds good compression rates and also saves the 24 byte tx header, 50 byte TCP ACK, and also the MTU header of 20bytes, per added tx. That's almost 100byte savings per tx just from ACK's and headers. All that combined, and after compression, typically I see between 20 and 25% compression total on average.

Why does concatenating work when a single <500 byte tx won't compress. That has to do with the lack of repeating data in a small , single tx. But combine them into one block of data then you start to have repeating patterns in the data that the compressor can use. The data is serialized and in binary format so it doesn't compress as well as a text file, but then serializing data is also a form of compression, so we are actually compressing a compressed file and still getting 20% out of it when done right.

EDIT: come to think of it, that doesn't include all of the savings. There are also an additional savings for every inv that get's concatenated as well. Because Bitcoin uses TCP_NODELAY, every messages goes out as soon as it's put in the buffer which incurrs additional overhead of the 20 byte MTU header, TCP ACK as well as the message header. So again we save 94 bytes for each inv/getdata request that gets bundled. Although that's not strictly file compression we do save those bytes as well.
 
Last edited:
  • Like
Reactions: lunar and solex

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
@Peter Tschipper

I see thanks for making it clear.

i dont understand how core still has so much support. BU team is definitely proving itself. If all nodes were BU nodes using all the improvments you guys have made, bandwidth requirements would drop dramatically!
 
  • Like
Reactions: solex and Dusty

Dusty

Active Member
Mar 14, 2016
362
1,172
i dont understand how core still has so much support. BU team is definitely proving itself. If all nodes were BU nodes using all the improvments you guys have made, bandwidth requirements would drop dramatically!
From what I can see, the problem is PR: since BU does not have a 100M$ propaganda machine like BS, there is very little knowledge of the work being done here.
So it's up to us to advertise those features each time we write out from this forum :)
 

bitcartel

Member
Nov 19, 2015
95
93
Some test data, syncing a fresh install, connected only to BU nodes with data compression:

"blocks": 209175,
...
"summary": "Datastream Compression has saved 414.01MB of bandwidth",
"summary": "Datastream Compression : 26.4%",
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Up to block 209175 (some time late 2012)