Xtreme Thinblocks ready for code review PR#10

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
How does a node choose to include a transaction in a thin block vs include just the hash?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@theZerg that's based on the bloom filter. So it checks whether that transaction is Not represented in the bloom filter and sends those, whole and intact, but ALL the tx hashes will be sent regardless and in the correct order as they exist in the block.
 

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
Let me propose a theory which I am currently investigating, which is that it is less inefficient to send the bloom filter.

Either the node needs performance in which case it is going to be "caught up" to the transactions in the network, or it does not care about performance (or is just starting up, an edge case that I don't think is important to optimize) in which case it can grab the transactions at its leisure.

A key use case for thin blocks is communication across the GFC in which case we want to make the maximum assumption possible about what the other side has...


Observationally, I have a node on the relay network which is receiving blocks first so it essentially broadcasting to 4 other nodes. Transaction counts on the other nodes are pretty similar (within about 2%), and all those nodes are also exchanging transactions. Yet I am qualitatively seeing very different compression ratios on the same blocks going to different nodes some of the time.

This could be because the bloom filter is lost in transit, arrives slower (I forget is this TCP?), is somehow reordered by the send logic, or is not processed fast enough or out of order in the receive.

Thoughts?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
"Transaction counts on the other nodes are pretty similar (within about 2%), and all those nodes are also exchanging transactions. Yet I am qualitatively seeing very different compression ratios on the same blocks going to different nodes some of the time."

I haven't seen the same thing....however, sometimes, when you're starting up a node it takes a while for the mempools to catch up and sync and in our case the last few days with all the spam going on it can take 12, 24 hours or more with a lot of very big spam transactions out there. Consider that if you are missing just 4 spam transactions of 15KB each that makes a big difference for the thinblock. I think you will see very different and more consistent results, as I have, when traffic is more normalized or you have been running your node for several days.

"This could be because the bloom filter is lost in transit, arrives slower (I forget is this TCP?), is somehow reordered by the send logic, or is not processed fast enough or out of order in the receive."

I don't see how any of those events above could either happen or have an impact on thinblock size.

One thing that you are touching on though is that, if we could have mempools in perfect sync then yes we can drop the bloom filter without incurring a re-request. I was thinking about this as an approach to a fast block relay or "Lightning block relay for miners" where we could have an advanced mempool sync, drop the bloom filter and just send a headers first (no inv) with the coinbase transaction and with the tx hashes, basically a thinblock without transactions, the only requirement is that the receiving node would have to run a very large mempool.

But back to your question, yes we could drop the bloom filter now also, but then you will have to do a re-request every time which depending on network activity can be quick or take a while. With our single threaded network layer it's hard to predict how fast those re-requests take...i've seen them vary quite widely at times when the network gets busy. That's really in the end the value of using the bloom filter, which is to prevent re-requests and keep block propagation moving forward as quickly as possible.
[doublepost=1455771561,1455770561][/doublepost]@theZerg That is food for thought though, maybe in the next iteration we could make use of the headers first functionality in v12 and make it Thinblocks first (when we announce the block) but without the transactions, just the hashes and the coinbase and then do whatever re-request is needed, then we could drop the inv/getdata with bloom filter and be at least as fast and sometimes faster if all the tx's are in the mempool.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
@Peter Tschipper, @theZerg, @sickpig:

Awesome work guys! It is great to see this project begin to bear some fruit!!

I was wondering if we could easily repeat the Decker and Wattenhofer study for Xthin blocks:



Our methodology could be nearly identical I think.



It would be interesting to see what our results looked like, especially if we could get BU nodes all over the world including within the great firewall of China.



It would be super cool if we could experiment with 1MB+ blocks as well by creating a global test-net, but even making these measurements for blocks <= 1MB would be quite useful.

 
  • Like
Reactions: sickpig

sickpig

Active Member
Aug 28, 2015
926
2,541
updated stats
observation period: 2016-02-17 11:18:24 UTC - 2016-02-18 13:25:47 UTC
tb =thinblock

mined blocks: 180
tb sent: 174
tb received: 200 (my node could serve the same block to multiple peers)

Distribution of size (block) / size (received thinblock) = ratio:

Min. 1st Qu. Median Mean 3rd Qu. Max.
1.036 9.200 21.870 27.980 41.520 130.500


Distribution of size (block) / size (sent thinblock) = ratio :

Min. 1st Qu. Median Mean 3rd Qu. Max.
1.011 10.250 35.130 33.970 51.330 167.300


Just as food for thought I've reported here the block (both received and served) with a ratio lower than 2.

The first think that comes to my mind is that the higher the uptime the lower the frequency of block with a ratio lower than two. This seems to fit @Peter Tschipper theory about node mempool convergence.

This also seems to happen when 2 blocks are mined one just after another (especially for served blocks)


> rec[sent$ratio<2,]
time height peer thin_size block_size ratio prev_time delta_time
3 2016-02-17 11:43:24 398837 66.222.25.72 31556 999988 31.68930 2016-02-17 11:24:32 1132 secs
16 2016-02-17 13:39:37 398850 66.222.25.72 25205 859616 34.10500 2016-02-17 13:30:16 561 secs
43 2016-02-17 18:47:17 398878 169.45.95.196 41210 914472 22.19050 2016-02-17 18:35:35 702 secs
44 2016-02-17 18:48:48 398879 169.45.95.196 13350 191432 14.33950 2016-02-17 18:47:17 91 secs
52 2016-02-17 20:03:01 398887 169.45.95.196 23325 999998 42.87240 2016-02-17 19:31:19 1902 secs
58 2016-02-17 20:28:12 398893 169.45.95.196 31374 514248 16.39090 2016-02-17 20:25:30 162 secs
65 2016-02-17 21:33:28 398900 169.45.95.196 218546 949174 4.34313 2016-02-17 21:27:05 383 secs
66 2016-02-17 21:35:48 398902 169.45.95.196 5128 272498 53.13920 2016-02-17 21:33:28 140 secs
71 2016-02-17 22:30:37 398907 169.45.95.196 38796 996171 25.67720 2016-02-17 22:07:46 1371 secs
72 2016-02-17 22:56:29 398908 169.45.95.196 53236 934265 17.54950 2016-02-17 22:30:37 1552 secs
136 2016-02-18 08:27:27 398975 169.45.95.196 142666 341674 2.39492 2016-02-18 08:23:28 239 secs
140 2016-02-18 08:58:40 398979 169.45.95.196 31388 686401 21.86830 2016-02-18 08:55:45 175 secs

> sent[sent$ratio<2,]
time peerid thin_size block_size ratio prev_time delta_time
3 2016-02-17 11:24:57 62 711632 996176 1.39985 2016-02-17 11:24:32 25 secs
16 2016-02-17 14:50:05 367 567977 996174 1.75390 2016-02-17 14:50:01 4 secs
43 2016-02-17 18:15:18 147 868445 998084 1.14928 2016-02-17 18:15:11 7 secs
44 2016-02-17 18:18:44 147 614748 999935 1.62658 2016-02-17 18:15:18 206 secs
52 2016-02-17 18:54:41 252 740895 749191 1.01120 2016-02-17 18:47:18 443 secs
58 2016-02-17 19:17:44 252 383653 658177 1.71555 2016-02-17 19:15:22 142 secs
65 2016-02-17 20:03:50 252 53767 66225 1.23170 2016-02-17 20:03:01 49 secs
66 2016-02-17 20:03:51 3 53767 66225 1.23170 2016-02-17 20:03:50 1 secs
71 2016-02-17 20:14:42 3 631967 995173 1.57472 2016-02-17 20:11:29 193 secs
72 2016-02-17 20:14:43 252 743593 995173 1.33833 2016-02-17 20:14:42 1 secs
136 2016-02-18 04:14:17 3 186076 275390 1.47999 2016-02-18 04:12:14 123 secs
140 2016-02-18 04:33:12 1126 338741 405765 1.19786 2016-02-18 04:33:12 0 secs
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@sickpig the low ratios when a mined block comes right after the last is because those blocks are typcially very small and have few transactions. So the savings are not as great.
 

sickpig

Active Member
Aug 28, 2015
926
2,541
@Peter Tschipper it makes total sense, I just want to state explicitly. The only think that doesn't fit my mental model is that I thought it would have been the first cause.