BUIP010 (passed): Xtreme Thinblocks

hfinger

New Member
Jan 24, 2016
2
4
I tried xthinblocks using the 0.12bu branch in the previous two days. It seems there is some problem with the bloom filter parameters which leads to a lot of re-requests (and therefore a second round-trip) when the memory pool is very large. Interestingly, this leads to the fact that the first 20 blocks directly after starting the node are received faster with less probability of re-requests and the performance degrades over time. The following shows my two tests (one from yesterday and one from the day before yesterday):

Here is a chart of the number of transactions that needed to be re-requested for each block between height 400672 and 400772 and the size of the memory pool of the node:

Only 30% of the first 20 blocks needed re-requests and the median block processing time was 0.84 seconds.
But 66% of blocks 21-100 after node startup needed re-requests and the median block processing time was 1.23 seconds.

Here another sample where I again restarted the node to clear the mempool (block height 400823 to 400923):

Only 5% of the first 20 blocks needed re-requests and the median block processing time was 0.96 seconds.
But 66% of blocks 21-100 after node startup needed re-requests and the median block processing time was 1.04 seconds.

I would suggest to increase the bloom filter size considerably to reduce the probability of a second round-trip. Another possibility would be to don't add transactions to the bloom filter that were in the mempool for a long time. Because the probability that very old transactions will be included in the next block is much lower in comparison to very recent transactions. These very old transactions are increasing the probability of false-positives for ALL transactions. That can be seen directly in my two test cases where nodes seem to have better performance if they were just started up, because they don't have these old transactions in the mempool. Maybe I do some more specific testing of other parameters in the next days.

---

Btw, there seems to be a bug in the calculation of the compression ratio. All these re-requests were not counted towards the compression ratio, which invalidates many of the statistics that were posted on reddit in the previous week. Definitely re-requested transactions have to be counted towards the compression ratio, because without these transactions the xthinblock cannot be used and is totally useless. I created a pull request where this is fixed. This is my first pull request and I hope it adheres to all requirements.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,695
@hfinger this is great work.
Please reply to my PM if you wish to take up an invitation to our slack channel.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@hfinger We've been noticing that for a while now. The cause as you've pointed out is the very large memory pools we now have due to the back up in transactions. When the mempools normalize we won't see this issue, OR we can increase the size of the bloom filter. It's a trade off. I've been working on another approach which is to have a better mempool limiting in place which does two things: one keeps the bloom filters small and two reduces bandwidth use. It's somewhat analogous to what you described in not adding old tx's to the filter..what about not adding tx's that are not necessarily old, but probably not going to be mined? The problem with old tx's is that they can/do get mined as they sit around and their priority rises over time.

Here are some thoughts:

1) under normal conditions we won't and don't see these re-requests. If block size increases, this problem goes away.

2) it's important to keep bloom filters small since they can also increase the overall bandwidth if allowed to grow without bound.

3) re-requests are pretty fast generally and don't contribute to overall transmission time a great deal. saving bandwidth is great but overall transmission time is the main goal, being able to transmit a block of any size quickly over the network is the key to scalability in my view. Another 1/10 to 1 second isn't a big deal for p2p but would be for the miners.

4) node operators that want minimal re-requests and speed, can either run with a smaller mempool (100MB is the minimum) or run with a very large mempool (1GB) or more. (Running a mempool with the default 300MB is not optimal with all these tx's out there, i think we're disovering that now). The smaller mempool will have bloom filters with fewer false positives but the xthins will be slightly larger. On the other hand, If you run with a 1GB pool or larger and keep your tx's for longer (9days?) then you shouldn't be missing tx's (i haven't tried this approach, if someone could verify this it would be helpful, the default setting for eviction is 3 days so that should be increased as well as mempool size)

5) one other thing we could use is an eviction cache...if the mempool is very large and tx's are often getting evicted, then when we send out an xthin request and during that time we're waiting for the xthin many tx's could get evicted and will not be there when the xthin arrives causing a re-request. I had originally had this in place but it was slow and not well designed so i took it out for the time being. It's something again we won't ever need under normal conditions but occasionlly depending on mempool size is something we could be seeing here. is it worth putting in something that we'll never need if block size increases but has to be invoked for every single tx that enters the pool? I'm not sure but it's something I had planned to work on for the next release...

What to do right now, I don't think this is an emergency. Nothing is breaking, just being slowed down under adverse conditions. We're still doing very good but obviously we can do more optimizing.

Thanks again for such a good analysis, that's exactly the kind of work we need!
Join us on slack if you can, we're usually there in the mornings, it's a small group right now but a good one.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,695
@bitcartel Are you on slack yet? If you want to join the BU channels please PM me an email address.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
For those that haven't been paying attention, the community's positive reception to @Peter Tschipper's Xthin blocks has motivated Core to make a similar offering. They call their version compact blocks and describe the protocol in BIP 152.

I had seen many of the usual suspect praise compact blocks as being "quite a bit better than [X]thin," so tonight I decided to investigate further. If we ignore the "high bandwidth mode" [1] and focus on the apples-to-apples comparison of the product that competes with Xthin, then it seems to me that:

(Compact blocks) = (Xthin) - (Bloom filters).

With Xthin, the transmitting node uses the Bloom filter to determine which transactions to send by hash and which to send in full, to avoid a second round trip. AFAICT, with compact blocks, the transmitting node sends all the transactions by hash and just "hopes for the best." The novel use of Bloom filters is what made Xthin innovative, so I wasn't suprised to see it removed by Core...but I suspected that Core would have come up with something else equally innovative to take its place. It seems they haven't.

In testing, Core reports that a second round trip is required ~10% of the time, which is significantly worse than Xthin (Xthin requires a second round trip less than 1% of the time IIRC). Furthermore, I doubt that this 10% figure would be accurate "in the wild" without babysitting the nodes to ensure synchronized mempools.

So, am I missing something? Is there anything innovative or novel about Core's compact blocks?

[1] Which really defeats the purpose for bandwidth-conscious nodes and is something Unlimited is working on for latency-conscious nodes anyways (called XRelay).
 
Last edited:
  • Like
Reactions: freetrader

sickpig

Active Member
Aug 28, 2015
926
2,541

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
@sickpig:

Regarding UDP, I'm confused because in the BIP they specifically refer to TCP performance in the context of compact blocks:



That being said, I agree UDP would be novel, if they do indeed bring it into production.

A similar comment I made moments ago on /r/btc seems to have awoken Saruman, although I can't really parse his comment in the context of BIP152. Is he perhaps working on something more advanced than what is described in the BIP?
 
Last edited:
  • Like
Reactions: solex

sickpig

Active Member
Aug 28, 2015
926
2,541
@Peter R

WRT UDP: I'm confused too, there's no a single mention of UDP in the BIP draft. Still the branch where Corallo's implemented it is named udp.

The actual branch that I linked above is named udp+wip and I'm sure, reading the code, that they are using UDP for relying blocks.

I've ROTFL at your reply to Gregory NIY/DIY Maxwell. Ho is it possible that you have bee already downvoted by at least 4 person by the way? What's happening to r/btrc these days :p

One last question: what's BIP152? never heard of it.
 
  • Like
Reactions: Peter R

Peter Tschipper

Active Member
Jan 8, 2016
254
357
I think that with Core's intended strategy of dropping the bloom filter they're going to find that they will be doing a lot of round trips. Keeping the mempools in sync is not very realistic and will only diverge further as transactions rates rise giving rise to more re-requests IMO. We should do a little study on how many tx's are missing and have to get sent with the xthin...that will tell us how many round trips we saved by using the bloom filter. Just taking a quick log at my logs, eyeballing it, it seems that almost 50% of the time there are tx's missing from the mempool and will cause Core's version to re-request those. It's not all that easy, as we found out, to keep the mempools in sync as sometimes tx's are not announced or they were rejected by the mempool limiter.

Furthermore the idea that they are doing less round trips is bogus, they are doing exactly the same thing, inv/getdata but without the bloom filter. It's the same number of trips, except with us we send the bloom filter appended to the getdata and then we mostly get the response in full with no other re-requests needed 99% of the time...with their approach they will be doing re-requests on top of that. Anyway, I don't mean to bash them, I wish them luck...as I've said before, if they can figure it out and make it work I'm all for it...but it ain't as easy as it sounds.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
Thanks for the interesting analysis of CB and commentary.
My take-away as an outsider is that GMax is now very circumspect about correct attribution, which can't be bad.
 
  • Like
Reactions: awemany

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I hope no one's offended by trying to put things into perspective, because I think that's what is really going on in that damn discussion.

The argument started when Lightsword said that miners turn off their Bloom filters due to DoS concerns (implying that Xthin thus won't be practical). I then pointed out (with sloppy language in hindsight) that the Bloom filter he was referring to was different than the one used by Xthin (i.e., it would not be turned off nor would the DoS vectors necessarily be the same). I should have made it more clear that I was referring to the instantiation of the filter and I wasn't trying to make statements about the software class itself.

At first Gmax responded by following my intended context, but then--after I had responded in turn--he injected a new paragraph in his earlier post about how Xthin uses Matt's CBloom filter, thereby adding confusion regarding what we were even talking about.
 
Last edited:
  • Like
Reactions: awemany

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
We should do a little study on how many tx's are missing and have to get sent with the xthin...that will tell us how many round trips we saved by using the bloom filter.
Great idea! Do you think we can fish this out of the logs from @sickpig's experiments?

Just taking a quick log at my logs, eyeballing it, it seems that almost 50% of the time there are tx's missing from the mempool and will cause Core's version to re-request those. It's not all that easy, as we found out, to keep the mempools in sync as sometimes tx's are not announced or they were rejected by the mempool limiter.
Yeah, this is why before I read BIP 152 I was under the impression that they must be doing something else on top to remedy this problem. As far as I can now tell, they are not; "compact blocks" (low-BW mode) is essentially Xthin without the Bloom filters.

Regarding your note that transactions are missing ~50% of the time, BIP 152 present this result, which I took to mean that in their testing, transactions were only missing ~10% of the time. But I'm not sure if I'm parsing that correctly (of course, even at 10% I think the use of the Bloom filter is more than justified).



Furthermore the idea that they are doing less round trips is bogus, they are doing exactly the same thing, inv/getdata but without the bloom filter. It's the same number of trips, except with us we send the bloom filter appended to the getdata and then we mostly get the response in full with no other re-requests needed 99% of the time...with their approach they will be doing re-requests on top of that.
What they've done is created a "high-BW" mode which--if enabled--blasts out the thin block unsolicited along with the INV to select peers. So, in this mode, it is possible to receive blocks in 0.5 round trips. But in my opinion, this is an entirely different product, and calling it part of Compact Blocks just serves to add confusion to the comparisons with Xthin.

But yes, for compact blocks in low-BW mode, they appear to be doing the inv/getdata like us but without the Bloom filter. In fact, from a mathematical perspective, could it not be argued that Xthin degenerates to Compact Blocks (low-BW) as this size of the Bloom filter approaches zero?

Anyway, I don't mean to bash them, I wish them luck...as I've said before, if they can figure it out and make it work I'm all for it...but it ain't as easy as it sounds.
It would be really nice if we could stop arguing about what way is "better" and instead discuss the properties of each technique and allow the free market to make its choice.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I think that with Core's intended strategy of dropping the bloom filter they're going to find that they will be doing a lot of round trips. Keeping the mempools in sync is not very realistic and will only diverge further as transactions rates rise giving rise to more re-requests IMO. We should do a little study on how many tx's are missing and have to get sent with the xthin...that will tell us how many round trips we saved by using the bloom filter. Just taking a quick log at my logs, eyeballing it, it seems that almost 50% of the time there are tx's missing from the mempool and will cause Core's version to re-request those. It's not all that easy, as we found out, to keep the mempools in sync as sometimes tx's are not announced or they were rejected by the mempool limiter.

Furthermore the idea that they are doing less round trips is bogus, they are doing exactly the same thing, inv/getdata but without the bloom filter. It's the same number of trips, except with us we send the bloom filter appended to the getdata and then we mostly get the response in full with no other re-requests needed 99% of the time...with their approach they will be doing re-requests on top of that. Anyway, I don't mean to bash them, I wish them luck...as I've said before, if they can figure it out and make it work I'm all for it...but it ain't as easy as it sounds.
I think you nailed it when you pointed out that the mempool sync problem will get worse as transaction rates rise.

I ran a simulation where I assumed various levels of mempool homogeneity, and, indeed, as the transaction rate rises (i.e., as we get more transactions per block) there comes a point when nearly all of the the "compact blocks" will be missing at least one transaction and thus require a second round trip.



On the other hand, the performance of Xthin should be highly consistent as transaction rates rise.

What this means is that for large blocks, "compact blocks" will typically result in a second round trip compared to Xthin but it will benefit from bandwidth savings approximately equal to the size of the Bloom filter.
 
Last edited:

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I've seen Gmax make the false claim (cf. last paragraph) a few times now that Xthin requires a second round trip 66% of the time. It looks like he got this from @hfinger's post up-thread. Although hfinger's experience was an edge case (typical performance is much better) [EDIT: and was made worse due to bugs inherited from Core that have now been fixed], it motivated me to examine the math.

It is a standard result that the probability of a false positive in a (ideal) Bloom filter (P_fp) is approximately

P_fp = 2^[(-m/n)*ln 2],

where n is the number of elements the filter is taken over, and m is the size of the filter in bits. I used this equation to draw the following chart, illustrating the Bloom filter's performance as a function of both the size of the mempool and the size of the Bloom filter.



As can be seen, there is a narrow band marked "sweet spot" where the filter is not so big that it wastes bandwidth unnecessarily, and not so small that all the information regarding the contents of a node's mempool become lost.

An extra round trip is required if any of the transactions the node is missing result in a false positive when checked against the Bloom filter. The probability of an extra round trip (P_extra) is thus given by the equation

P_extra = 1 - (1 - P_fp)^(N_missing)

Now back to @hfinger's experience of P_extra = 66%...If we assume that his Bloom filter wasn't saturated and that its false positive rate was something like 1% - 0.1%, then solving for N_missing in the above equation implies that his node must have often been missing hundreds of transactions from blocks (very poor mempool homogeneity). The other explanation is that his Bloom filter was saturated (from my graph, it looks like 20,000 TXs in mempool is getting close to the ceiling of what the 32kB Bloom filter size limit supports).

Anyways, what I thought was interesting is that--given the way the math works out--Xthin suffers a problem similar to the problem I just described with compact blocks with the extra round trip, but in the case of Xthin, the problem requires vastly higher levels of mempool heterogeneity in order to occur (or a saturated Bloom filter).
 
Last edited: