BUIP010 (passed): Xtreme Thinblocks

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@Peter R

So I suppose Node B can determine what Node A used for m and k (and any other hashing details required) based on the some header bytes attached to Node A's Bloom filter?
You are correct, when we send a bloom filter it's not just a bloom filter, it's a class object which has the other data (nhashfuncs, tweak etc..) included.
 

dgenr8

Member
Sep 18, 2015
62
114
@Peter Tschipper

Dagur and I are having trouble understanding why you think this protocol (Xtreme) is better than Hearn's original thin blocks, coming in XT 0.11.0E (XT).

Are you aware that all bitcoin nodes already maintain a bloom filter "filterInventoryKnown" for each peer, of the 50,000 most recent transactions that they already know that peer to have?

In the XT best case, the coinbase and merkleblock are pushed to the receiver, and there is no round trip whatsoever. Any other transactions known not to be known by the peer are also pushed immediately, no roundtrip requirement. If the receiver is somehow missing transactions that the sender thought it had, it requests them all at once. It can also receive parts of a block from multiple peers at once, allowing them to race to provide the details.

In the Xtreme best case, a round trip is always required for receiver to share its mempool filter, and as far as I can tell there is no multiple-peer support.
 

hfinger

New Member
Jan 24, 2016
2
4
I think it could be problematic to use always only the first 64 bit of the transaction hashes. In case someone wants to attack the network, couldn't he produce a rainbow table with transactions to produce collisions of the 64 bit weak hashes?

As an alternative I would suggest to not always use the first 8 bytes of the 32 byte tx-hashes, but use random bytes of the full 32 bytes. So for each thinblock you could first determine randomly which 8 bytes of the full 32 bytes are used. Then you send this together with the thinblock.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@dgenr8

It looks like we all have different definitions of what a round trip is. Both your version and ours have to issue a getdata and then receive the block. So there is no difference there between us other than we send a bloom filter with our getdata.

Nodes have a setInventoryKnown which is used to filter transactions when doing a MSG_FILTERED_BLOCK that your version uses. But, it's only a list of what was sent, not list of what the other peer necessarily has or still has in its mempool. There is no way for you to know really what the other node has missing. You're only assuming that you're sending them everything they need.

I'll go through some other differences.

1) Your version sends a merkleblock. It's not a partial merkleblock which MSG_FILTERED_BLOCK was designed for but a full merkleblock. They are big structures. A typical merkleblock for a full block of 3000 tx's will be almost 200KB in size. And you're downloading up to 3 of those per block request. So where are the saving there? How can you compete in speed or bandwidth against Xthinblocks which would be 25KB plus a bloom filter?

2) You are also sending tx's separately and one a time, which incurrs additional overhead in terms of a TCP ACK for each message and a 24 byte message header...that's 70 bytes extra per tx. Xthinblocks only sends one message which contains everything the other node needs.

3) Multiple-peer support: Yes we support multiple peers. We can download thinblocks from any peer that supports it. There is feature that perhaps is misunderstood and was initially designed to make it easier to test. Its just a config setting of "connect-thinblock=<ip>". In this way anybody can connect to a thinblock provider and only download thinblocks from them. It makes it easier to test but there is also another use; because Bitcoin Unlimited doesn't have many nodes out there "yet" , anyone running a node could in effect become a thinblock provider and other nodes could connect to them. In this way we can get the benefit of thinblocks to many users right now, until such time that thinblocks are more widely supported. And furthermore we can connect as many as 8 thinblock providers with "connect-thinblock". So we could have lets say, 20 or 30 regular nodes which could still get a thinblock if they happened to ask for one from a supporting peer, and the rest of the nodes could connect to them with "connect-thinblock".

4) Finally, one of the other problems with your version comes when transactions get evicted from the memory pool. Each node is not necessarily in sync as far as Tx's. 11.2 for instance is very aggressive at not letting tx's in the pool, 12.x does a different eviction approach, XT also different. Now what happens when you have evicted a tx that was already sent to you by the node you are requesting a thinblock from. Well, that node won't send it to you again because of "setInventoryKnown". You can't get that tx again to rebuild your block. So now you're stuck having to request the block in full.

Parting Shots: I don't have any, but I would like to say that in no way is my work here intended as sand in the face of Mike Hearn or the XT group. I have the greatest admiration for the work and sacrifice that Mike made: He took a lot of hits. Xtreme Thinblocks was inspired by him and he is mentioned as such in the commit log along with Dagurval. Also, it was really Dagurval's tenacity in not letting this go when Mike was letting go that got me off the couch, so to speak, and try to come at it from a different angle. We have two different implementations, I think that's amazing by itself. And a more competitive process and a community that will ultimately decide what it wants...it's all good.
[doublepost=1453686804][/doublepost]@hfinger Well it's an idea, but if that were to occur we would just send a normal thinblock with the full tx hashes...We wouldn't even have to re-request it as it gets flagged on the sending node. It's already coded that way and tested and works. I'm not sure the extra work is really worth it to save 30KB or 50KB of thinblock size in such a rare event. Unless anyone else feels strongly about it?
 
  • Like
Reactions: cypherdoc

dgenr8

Member
Sep 18, 2015
62
114
@Peter Tschipper

Ok, you're correct about the get block round trip, and I agree that Xtreme achieves bandwidth reduction with the truncated hashes and new protocol (which requires both nodes to be upgraded). These reductions probably overcome the overhead of constructing and sending the bloom filter in most cases.

Regarding #4, the statement that you can't get a tx again is not correct. The individual tx will be asked for (possibly amongst others), and will be sent. filterInventoryKnown (formerly setInventoryKnown) is not checked in getdata.

There's no reason nodes could not support both of these ideas.
 

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,994
@Peter Tschipper

how does the murmurhash split the 64 bit tx hash into the bloom filter bit slots?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@cypherdoc

The mururhash is separate from the tx hash. The murmur hash is used to construct the bloom filter. Where we get the 64 bit tx hash is straight from the uint256 class. There is a method there called "GetCheapHash" which just returns a 64bit portion of the tx hash.

Oh, just re-read your question, the murmur hash will hash the full tx, not the 64bit version. The 64bit tx is used only when we send back the xthinblock. It's not used to create the bloom filter. Here's a great little tutorial and app which anyone can play around with and actually see bloom filters in action.

http://billmill.org/bloomfilter-tutorial/
[doublepost=1453741654,1453740681][/doublepost]@dgenr8

@Peter Tschipper

Ok, you're correct about the get block round trip, and I agree that Xtreme achieves bandwidth reduction with the truncated hashes and new protocol (which requires both nodes to be upgraded). These reductions probably overcome the overhead of constructing and sending the bloom filter in most cases.

Regarding #4, the statement that you can't get a tx again is not correct. The individual tx will be asked for (possibly amongst others), and will be sent. filterInventoryKnown (formerly setInventoryKnown) is not checked in getdata.

There's no reason nodes could not support both of these ideas.
Yup, I know it's not checked in the getdata, it's checked when asking for the MSG_FILTERED_BLOCK. Only tx's that are not in setInventoryKnown will get sent. I checked your new code, that's true you changed the filtering mechanism, but most nodes don't support that yet and it works basically in the same way...although I believe every time a block is found the new filter gets cleared. Maybe you can get the tx again, I'm skeptical but lets say you are right, it's still not the main issue regarding the differences in the two approaches.

FYI, if you go with the xthinblocks approach you could keep the mempool eviction code in XT which was a big deal to some of the supporters of XT. Xthinblocks will work with any eviction strategy.
 

YarkoL

Active Member
Dec 18, 2015
176
258
Tuusula
yarkol.github.io
@Peter Tschipper
It is my understanding that bloom filters are used in SPV wallets primarily for privacy reasons, for obfuscating the set of transactions that are "owned" by the wallet. Am I correct in assuming that with thin blocks instead, blooms are used for their small size (more efficient requesting of transactions) and there are no privacy benefits here?
 

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,994
that's what it looks like to me. once blocks are getting relayed around the network, their originating tx's are much less clear.
[doublepost=1453744627][/doublepost]@Peter Tschipper

why wouldn't xthins be competition for the relay network? miners need to communicate their blocks thru their local full nodes too.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@YarkoL

I haven't looked at it in detail but I believe the bloom filter for SPV wallets is being used in somewhat the same fashion as we do. I've seen the size of the filters that spv's will send, about 2-3KB as far as the ones i've seen, and I haven't looked for many, only a couple in passing. But what they initiate is the sending back of a merkleblock whereas we send back and xthinblock. The merkleblock is designed to send only a "partial" merkletree which contains the tx hashes and block header that the spv is missing. It's very space efficient when you do that as you don't have to include every tx hash, just a few but at the same time you can verify the merkleroot. And then the actual missing tx's are sent after the merkeblock is sent. I'm not sure if any of this has anything to do with privacy, I don't really know.
[doublepost=1453745525,1453744717][/doublepost]@cypherdoc
that's what it looks like to me. once blocks are getting relayed around the network, their originating tx's are much less clear.
[doublepost=1453744627][/doublepost]@Peter Tschipper

why wouldn't xthins be competition for the relay network? miners need to communicate their blocks thru their local full nodes too.
Good points. I don't know that there is a big gain there in terms of privacy but I'm not an expert there and as far as the other point, I don't exactly know how the miners push their blocks to Matt's relay network. I haven't studied his code or how that all works. But, If Matt really does stop supporting the relay network then this is really a pretty good replacement for it in my view and it's decentralized on top of it! (BTW: out of curiosity is there any real data out there about Matt's relay network and how fast or reliable it is?)
 
  • Like
Reactions: cypherdoc

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
FYI, if you go with the xthinblocks approach you could keep the mempool eviction code in XT which was a big deal to some of the supporters of XT. Xthinblocks will work with any eviction strategy.
What I like about Xthin is that it seems to give the most autonomy to the individual nodes, while still incentivizing cooperative behaviour between nodes.

For example, Node A in @Peter Tschipper's example could run any mempool policy it chose. If it chose to block a bunch of "standard" transactions from its mempool, then it would need to download more information from Node B to keep up with the network. Similarly, if it chose to accept a bunch of "non standard" transactions into its mempool, then its Bloom filter would be bigger and it would take longer to make the get_data() request. In both cases, Node A receives the complete block information at a greater expense by deviating from "standard" behaviour, but it is still free to do so if it considers the costs worthwhile.

Now hopefully "standard" behaviour becomes an emergent phenomenon rather than a dictate from the Core Commissars.
 

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,994
@YarkoL is correct that they were also designed to create a certain level of false positives thus helping to obscure the specific tx's the SPV client is looking to verify. now that you've taught us better how this is done, i'm assuming it's by decreasing the sparseness of the filter.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@cypherdoc Ohhhh, ok...that makes sense. I suppose they could obfuscate by increasing the false positives if they wanted to. It would be up to the SPV wallet to do that.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
@dgenr8
I'm thinking aloud here because we all share the common ground that *any* robust thin-blocks enhancement incorporated into standard full node clients is a significant benefit to Bitcoin.

As someone who long ago got the T-shirt for having written usable software and seen it sidelined at the last minute, I fully appreciate the desire not to change course, especially when two alternatives have comparable benefit.

So, if XT proceeds with dagurval's XT-thinblocks solution is there an advantage if BU supported this as well? Such that a BU client will default to Xtreme thin blocks when connecting to another node which supports it (likely just BU at present), but will then revert to using XT-thinblocks as a backup?

If so, then a BUIP can be raised for implementing this
@Peter Tschipper your thoughts welcome as well.

We have no ideological axe to grind and just want the best support for main-chain scaling.
 
Last edited:
  • Like
Reactions: awemany

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@cypherdoc Thanks for that link, was very enlightening. I think it's given me another thought for extending xthinblocks for maybe a phase 3. We could use "connect-thinblock-relay" for a true replacement to matt's relay network. In fact it wouldn't just replace but be faster and better. I think, from what I've read so far we are already close, maybe even faster already, or slightly slower. Matt's relay network seems to be variable in how much data gets sent and can send data the other side doesn't need, sometimes a lot of it. I have to look more deeply at the protocol but don't want to spend much time there either since I think we have the basis for we need to make it even better and of course fully decentralized. Basically the only thing we don't do better is that we use inv/getdata to get our blocks. So if we had a node "connect-thinblock-relay", they would be signalling to us to just send us a thinblock as soon as we received a block, rather than having to inv/getdata. That does two things, one saves time, two does away with the need for a bloom filter. The downside is the other node would have to maintain a mempool big enough to make sure it has every tx it needs to build the block, otherwise they would have to re-request.

Here are a few interesting snips from matt's relay website.

* The relay nodes do not follow the standard inv-getdata-tx/block flow,
but instead relay transactions/blocks immediately after they have done
their cursory verification. They do keep some track of whether or not
your nodes claim to have seen the transactions/blocks before relaying,
but you may see transactions/blocks being sent which you already have
and have not requested
, if this is a problem for you due to bandwith
issues, you should reconsider your bandwith constraints and/or are
peering with too many nodes.
* The relay nodes will all relay among themselves very quickly, so
there is no advantage to peering with as many relay nodes as you can
find, in fact, the increased incoming bandwidth during block relay
spikes may result in higher latency for your nodes.
* The relay nodes are NOT designed to ensure that you never miss data,
and may fail to relay some transactions/blocks
. The relay nodes are NOT a
replacement for having peers on the standard P2P network, nor should you
rely on it as your only fast block relay method.
 

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,994
@Peter Tschipper

The downside is the other node would have to maintain a mempool big enough to make sure it has every tx it needs to build the block, otherwise they would have to re-request.
for nodes wishing to receive xthins w/o a preceding inv/getdata/bloom it could be recommended/required for them to have at least 3.4GB of total memory. i run 6 nodes and my 1GB+2.4GB vswap setup has never been overwhelmed (got close) even during spam attacks.
[doublepost=1453764322,1453763603][/doublepost]but then again, if we start getting bigger blocks, mempools are going to start getting cleared out pretty well, hopefully.
 

rha

New Member
Jan 25, 2016
7
6
I recently played with indexing tx hashes from the main blockchain. There are no two different hashes having the same first eight bytes. There are more than 100 000 000 tx hashes without single 64 bit collision. This isn't a proof of no collision in future, but it shows how good the 64 bit tx hashes are.

(In contrast, 160 bit hashes of BTC addresses often collide even with equal first 18-19 bytes, being different only at several last bits.)
 

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
(In contrast, 160 bit hashes of BTC addresses often collide even with equal first 18-19 bytes, being different only at several last bits.)
That sounds really weird - is the hash function that bad or have these addresses been designed that way? Do you have examples? I assume you are not talking about fake addresses that just have the correct checksum but no (known) corresponding private key?
 
  • Like
Reactions: Peter Tschipper