Towards faster block propagation: jtoomim's blocktorrent proposal

freetrader · Jan 6, 2016

In September 2015, Jonathan Toomim made the "blocktorrent" proposal to speed up new block propagation in the face of increasing block size:

http://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-September/011176.html

https://www.reddit.com/r/btc/comments/3zo72i/fyi_ujtoomim_is_working_on_a_scaling_proposal/

I'm not sure whether this is something actually being worked on or discussed elsewhere right now. From what I could see on the bitcoin-dev archives this was never seriously discussed, and not dismissed on technical grounds.

I find the proposal very appealing, more so than implementing yet another fast relay network (in case the existing one inhibits propagation of larger blocks).

Perhaps it would be a good attractor for interest from the mining community if we can get something like this working in BU.

Update:

Jonathan Toomim has confirmed on Reddit that no code has been written so far, that he has further thoughts on it and is open to co-operating with others who may want to assist. I have contacted him by email and ~~await his answer~~ he answered in this thread.

_mr_e · Jan 6, 2016

Yes, as a choice, with default on. (Although off during the beta) subchains propagated thus way would be killer, don't tell me bitcoin can't scale.

theZerg · Jan 6, 2016

As nearly as I can make out from a quick read, this proposal takes advantage of the fact that the transactions are likely already well distributed throughout the network. It basically creates an analogy between the block's natural division by transaction and a bittorrent file's unnatural division into chunks.

This is likely the key bandwidth saving feature, and since its pretty obvious I kind of think that the bittorrent analogy is actually obstructing the understanding of this proposal and maybe making it seem more unique than it actually is, for example the IBLT proposal takes advantage of the same ...but anyway... back on subject:

It also has the effect of reducing message size and fragmenting the data into order independent packets, which are two changes that are likely to increase GFC penetration and reduce packet drop induced latency.

I think that there is a lot of optimization that can be done beyond the original proposal, especially in the excessive handshaking that is probably bittorrent inspired but unnecessary in a network where we already know that everyone is very interested in receiving new blocks ASAP. In other words, the mining node should probably just fire the block header and Merkle transaction tree off to every node it can reach rather than do the whole INV, request, response thing.

But there's no reason to think that Jonathan isn't doing some optimization -- this "napkin" proposal is not going to be the final work.

So basically, if it exists I think we should definitely give it a try!

freetrader · Jan 6, 2016

I think that there is a lot of optimization that can be done beyond the original proposal

@theZerg , I completely agree, and I also think it's important to coordinate with Jonathan in case he's progressed beyond the idea stage or is intending to do so himself.

I'll need to look at IBLT in more detail to understand how much of the blocktorrent proposal would actually be rendered obsolete by it. It would be helpful if knowledgeable people could comment on this.

Peter R · Jan 6, 2016

@freetrader: have you given much thought to subchains? I would be very interested to hear someone's estimates of how much effort it would be to get a prototype implementation running on Testnet.

freetrader · Jan 6, 2016

@Peter R : No, I have only skimmed it. From what I've read I believe it is a great idea, but my understanding of it is far too shallow to be able to place it in the big scheme of priorities.

My feeling, and please correct me, is that a subchain implementation needs to get well and truly into the nitty gritty of weaker blocks, verification, tracking of subchains, a new delta block format (?), etc.
It looks at least like ripping open the torso and implanting a couple of new organs into the patient.
It reads to me at least 3-5 (more if I'm honest), times more complex than torrenting around existing block data. P2P tech is well understood and proven.

I'd be very hesitant to estimate a subchains implementation. That's something @Gavin Andresen or Mike Hearn could probably ballpark.

What I'd be very interested to know from you, since I'm unsure:

Your paper states subchains don't need a hard- or softfork. That means they could be implemented on top of the existing code as an optional feature, right?

freetrader · Jan 6, 2016

Linking IRC (03 Jan 2016) which contains some interesting discussion between jtoomim and sipa:

http://bitcoinstats.com/irc/bitcoin-dev/logs/2016/01/03

chainstor · Jan 6, 2016

How different are subchains ( and their weak blocks) from what P2P currently implement with their sharechains? Reviewing that code might be a start if you are looking to scope the complexity.

But I would agree with freetrader that its a potential black hole given the limited dev resources to hand.

Better to concentrate on a bu-relay that could take the best of both Matt's RN and toomim's blockTorrent idea. Blocktorrent is more embedded in the code but I cant see how it cant run separately, like RN.

Also, it might be an idea to discuss the protocol it runs on - tcp or udp. Something a bit more exotic like sctp could also be a contender, if its performance has improved over the last few years since I last looked at it.

freetrader · Jan 6, 2016

@chainstor: I don't know about SCTP performance (or platform support - I guess we need to watch out to keep all the platforms on board as this is fundamental network layer stuff).

re: UDP / TCP, @jtoomim in the IRC indicated that he wants UDP, perhaps with optional TCP support. I think an initial naive implementation could probably start with UDP.

As re: building it external as a kind of bu-relay, do you have comments on his concerns re: requirement to access the mempool ? If you can alleviate these concerns it might sway things in favour of an external process which as I understand it would provide benefits (sipa's argument).

solex · Jan 6, 2016

I like the blocktorrent proposal too, and voted "yes", and think that it should be part of every BU node capability to strengthen decentralization, not a separate mini-network like the RN.

Although I proposed duplicating the RN as a temporary solution, it is not within the BU ethos, and should be a last resort option.

jtoomim · Jan 6, 2016

Hi folks. I don't usually watch this forum (nothing against it -- just not one of my habits), so if anyone is interesting in helping the implementation of this, they should contact me elsewhere and we can chat.

I do intend to implement this myself if nobody else does, but I don't have time to do it all myself right now.

I'd like the implementation of this feature to be as independent of the many locks in bitcoind (especially cs_main, but also mempool.cs) as possible. In order to minimize problems, I think it will be a good idea to implement our own transaction cache outside of the purview of any main or mempool locks

I also looked into SCTP, but I think that since we can get the same data from multiple peers, we should be using redundant requests of the same missing data from multiple peers.

I also think there's a ton of cool stuff we could do with forward error correction so that it doesn't matter which packets get dropped, as long as enough get through.

I think that each peer connection should be implemented as a TCP connection for handshake/control information and little else (in part to avoid some DoS scenarios, especially being complicit in an amplification attack), with a UDP port open for the actual data transfer. The TCP connection could be the standard bitcoind p2p TCP, but I think it would be preferable to have it be completely different.

I expect each peer to be connected to a larger number of peers, around 50 to 200, through this protocol in order to achieve optimal performance. The main p2p protocol suffers from performance hits if the number of peers is too high due to the way blocks and transactions invs are broadcast to everyone, with block uploads in parallel instead of series (http://rusty.ozlabs.org/?p=509). That problem won't occur with blocktorrent, and the invs are only one every 10 minutes instead of several per second.

I also have some ideas for peer preference flags. A peer could volunteer himself as a miner (optimize for early delivery to me, please!), supernode (I have a lot of bandwidth and will upload a lot!) or low-bandwidth user (don't send me more than you have to, and don't prioritize my latency!).

Further options could be for whether it is permitted to start sending data without it being requested first, and if so, how much (optimistic transmission limit). As a well-connected miner, I might be willing to accept 50 kB of data from you without asking it from you just to reduce latency. (That should be enough to code a typical 1 MB block, given the compression tools we have.) If I tell all of my peers that, I might end up getting 2.5 MB of data sent to me in total when a new block is published. That would be okay with me, because it gives me the block 300 ms faster. Others might only be willing to accept about 2 kB optimistically -- enough for the header and one intermediate level of the merkle tree. That saves on the number of round trips, as from that point you just have to start requesting the descendants of those nodes in parallel from different peers.

It will be very useful to have accurate information stored about what each peer has in their mempool. That way, we can guess what transactions they have and don't have, and volunteer some of the missing transactions. Bloom filters can be useful for this task. It could also be worthwhile and affordable to have a bitmap or boolean array for each peer to indicate which transactions they have. That might be easier to program with and more reliable than bloom filters. If you have 40 peers, and you can keep track of mempools with up to 200k transactions, that would require 1 MB of RAM. Probably affordable, right? That bitmap could be populated by listening to each peer's INV messages for transactions. This could be complicated by eviction policies, though, and would involve more interaction with the main mempool and p2p code (since you don't get INVs for txs over the blocktorrent ports and connections as I currently envision it).

If we know that a peer has a tx, for example, we can identify it pretty reliably by just using the last 4 or 5 bytes of its hash. We can also check their other mempool txs to see if any other txs share the same hash. If they run into a collision that we don't expect (very rare), they can detect this and request the full tx hash either from us or from one of their other (maybe lower-latency) peers. This means that we can usually encode a 1 MB block with 3000 transactions using a bit more than 12 kB of traffic, if the peer already has the transactions. (Thanks to Felix Weis for this idea.) If there is a hash collision (which is easy to force), we can detect it and request the full tx hash or the full tx.

freetrader · Jan 7, 2016

@jtoomim :

Thanks for that detailed response, it's great to hear your ideas.

if anyone is interesting in helping the implementation of this, they should contact me elsewhere and we can chat.

I sent you a quick mail @ jtoomim.org offering to help and discuss further, but don't know if you've received it yet. If not, just let me know where and we can chat.

My main concerns right now are that we (collectively speaking and those of us in BU who might work on the detailed spec / implementation):

start off on the right architectural footing so that it helps and doesn't hinder some other possibly important things coming up (your optimizations for one, and IBLT has been mentioned as another, which I'm not familiar enough with yet to decide whether it would be mostly orthogonal). You have obviously thought ahead of this in many ways, but perhaps others here can contribute to this discussion as well (esp. w.r.t. to the question of internal module vs. supporting an external relay process). I'd at least like to make sure I hear a number of views on that.
do it in such a way that a successful implementation can be absorbed easily enough by other Core-derived implementations. To me, this means that not only do we run a BUIP, but probably a BIP too (if we even get a number). I guess a lot of the details in my first point would be hashed out in the BUIP/BIP, although I would not like to compromise on anything @ Core if I felt that it would introduce unnecessary complexity or harm the performance of this feature without a sufficiently good technical reason.

I'll have to digest the many other detailed ideas you gave and form a clearer picture (not to mention study the existing code), but I look forward to participating in this, as will a number of others in the BU project, I am sure.

Thanks for any time you can spare to guide us.

jtoomim · Jan 7, 2016

> I sent you a quick mail @ jtoomim.org

Received, but j@toom.im is preferred for bitcoin stuff when actual people are involved.

> (esp. w.r.t. to the question of internal module vs. supporting an external relay process).

Internal module. We want this protocol to be the default option for everybody, at least until IBLTs are ready. Even after that, blocktorrent will be almost as good in the best case and much better in the worst case than IBLTs.

> do it in such a way that a successful implementation can be absorbed easily enough by other Core-derived implementations.

Shouldn't be hard. We're going to be rolling out our own TCP and UDP ports and mostly our own data structures (except the mempool, which we'll access). We shouldn't need to interact with the rest of the bitcoind code too much. This means we can put almost everything in its own file. Merging blocktorrent.cpp into another project should be pretty easy.

> but probably a BIP too (if we even get a number)

Getting a BIP number is easy if you read the instructions and follow them. Most people submitting BIPs haven't read BIP 1 and consequently are not following the format.

freetrader · Jan 7, 2016

@theZerg: given that BU doesn't have its officials elected yet, is it ok to assign this a preliminary BUIP number, or do you prefer to wait with that? I would rather move the discussion into a proper BUIP. At the same time, I expect that we'll have to work on the actual proposal quite a while (possibly 4 weeks), to decide the appropriate tradeoff between "time to market" and features/optimizations (of which I can see there are plenty).

My preference would be to prove the concept using an effective but not overly complex initial implementation which can be refined later.

It would help if we had our own BUIP#0 analogous to BIP1 to describe the BUIP workflow. Is something like that already in the works? (apologies if this is a FAQ)

[doublepost=1452152446,1452151278][/doublepost]@jtoomim

> Received, but j@toom.im is preferred for bitcoin stuff when actual people are involved.

Noted, thanks.

> Internal module.

Definitely the simplest approach initially, I agree.

I'm not familiar enough with the code yet to know if it'd be a lot more work to do that AND keep the door open for external relay so we could handle sipa's suggestion in future:

15:11 sipa my suggestion would be to first write it as a proxy, which can run over its own network, and to which you can connect using the regular p2p protocol
15:11 sipa like BlueMatt's relay protocol
15:12 sipa that means it's automatically compatible with all software

This sounds like a big benefit, but I don't know what the cost on performance and complexity would be.

> This means we can put almost everything in its own file.

Agreed. Should be possible to isolate this from the rest pretty well.
One of my goals would be to keep it as generic as possible (e.g. as a distribution mechanism for e.g. subchains or other yet unknown data)

> We want this protocol to be the default option for everybody

My view is also that it should be an option which is enabled by default - as far as BU goes.
The client would need to remain 100% interoperable with the current implementations which do not support blocktorrent.

> Getting a BIP number is easy if you read the instructions and follow them

Thanks for the pointer. Will discuss this with BU leadership to make sure we approach it correctly.

P.S. I'm in CET give or take 1 hour, with very variable work hours on this side project (though mostly during local evening / night time).

Aquent · Jan 7, 2016

@freetrader , I think the procedure is that everyone assigns their own BUIP number. We've had 5 BUIPs so far, so anyone who has a BUIP proposal can just create their own thread, stating something like BUIP006 Implement Blocktorrent in BU or whatever title. Look at BUIP002 or 005 for an example.

A third party, such as @theZerg or a secretary will only be involved if two BUIPs are proposed around the same time, and thus with the same number and even then to only assign different numbers to avoid confusion.

Eventually this will all be automated, but I think that's the procedure for now. So, if you wish to propose something, go ahead, you need no permission.

freetrader · Jan 7, 2016

@Aquent , thanks, that was basically my impression.

I believe things are moving fast on the BU organizational front, and I know I'm not usually up to speed, so I better ask

I'll go ahead and open a BUIP under the assumption that the process is as described in the current Articles of Federations, and we'll have a discussion phase where the actual BUIP contents and design details are hammered out, most likely alongside a prototype implementation.

theZerg · Jan 7, 2016

@jtoomim Is there some forum that you would prefer to discuss in? If not, maybe you could start frequenting this one a bit more ;-).

@freetrader I don't think we are quite ready for a BUIP. I would like to see a concrete proposal on the table. I actually don't think that writing this layer will take all that long to do so (to the experimental level anyway). I was just going to do something before actually presenting it, but anyway you guys are jumping the gun so some of my ideas:

0. Exist alongside of (and completely separate from) the existing protocol. The advantage here is that this architecture works well with BU's strategy to rebase on top of new Core releases.

1. Create a transport "plugin" layer so new transports can be easily added. And I don't mean just TCP, UDP, etc. You could eventually even add transports like "fake web" that masquerade as http requests/responses in order to bypass censorship, etc.

2. Start with SCTP (its like a reliable UDP) because its most likely to get the best performance

3. Layer several high performance, imperfect MST (minimum spanning tree), "network" within this network. This is essentially a high performance block broadcast network, and transactions travel in the reverse direction. This can be done by allowing nodes to negotiate certain preferences with another node. The key one:
a) forward block/transactions before validation
b) forward block header

4. Node "migration". A node is aware of its own capabilities and moves within the MST. Basically, a node can calculate its "weight" which is some combination of its capabilities + N * the number of MST nodes + M * number of normal nodes. If it wants to move closer to the source of blocks it talks to the node who sent it the block, asking to its parent's connection. It then attempts to connect to the parent and the parent decides based on the node's weight.

5. Cheaters/Sybil attack defeats. Pool nodes and other nodes at the top of the MST only follow node weight for half of the nodes it broadcasts new blocks to, and picks another half at random. This reduces attacker's effectiveness to that of a standard P2P network (although an attacker could disrupt the MST performance).

6. Block hop tracking for distribution optimization

7. Use approximate geo location ask known by IP address to optimize block distribution worldwide (optimize the top 1 or 2 levels of the MST tree)

8. Network performance checking. A pool node deliberately connects to some peers "listen-only", broadcasts a block or transaction, and sees when those peers receive it for different choices of broadcast children. This helps stop nodes who are cheating at 4.

9. Use Google protobufs so that we can quickly define messages and deal robustly with new versions

10. Pass blocks in a very efficient format, that assumes the listener has the transactions. For example, header + 32-bit hash of all contained transaction (AFAIK, this is basically what jtoomim is proposing), or subchains

seweso · Jan 7, 2016

It seems to me that IBLT + Weak blocks is far superior to this proposal. You should not do any round trips at all if you want to solve latency.

Pre-anouncing or synchronising the next block is the way forward. Any solution which starts to broadcast a found block after it has been mined can never be faster than solutions which already start before blocks are found.

freetrader · Jan 7, 2016

@theZerg,

I don't think we are quite ready for a BUIP. I would like to see a concrete proposal on the table.

Hmm, not sure what you mean by that actually.

I have started drafting BUIP006, but shall hold off publishing while we clarify how to proceed.

My understanding is that a draft BUIP gets created, discussed and worked into the concrete proposal.
Since that's probably not always feasible without some experimentation / prototyping, I was expecting that to go on while the BUIP draft is being refined.
Once the experimental software reaches sufficient maturity, the BUIP document would reflect it and could finalized. Maybe I got that wrong, but that was my impression based on what I've seen of the BIP process.

It seems you have a much more sophisticated approach in mind (re: below).

I actually don't think that writing this layer will take all that long to do so (to the experimental level anyway). I was just going to do something before actually presenting it

If you want to write experimental code implementing your ideas below, please do so.
I think they would be very useful.

0. Exist alongside of (and completely separate from) the existing protocol. The advantage here is that this architecture works well with BU's strategy to rebase on top of new Core releases.

I was aiming pretty much for this, although what I've written up so far would allow a certain amount of "switching to/from" blocktorrent or other transports (read: the existing inv/getdata protocol) depending on conditions.

1. Create a transport "plugin" layer so new transports can be easily added. And I don't mean just TCP, UDP, etc. You could eventually even add transports like "fake web" that masquerade as http requests/responses in order to bypass censorship, etc.

IMO this would be the A+ solution.

If you think such an abstraction layer could be easily flanged in, I would suggest we do this as a separate BUIP/BIP which just implements it for the currently available transport.

2. Start with SCTP (its like a reliable UDP) because its most likely to get the best performance

I am pretty unfamiliar with SCTP - as I've said I am unsure of whether we have access to acceptable implementation of this protocol on all the platforms (and if there is a lack e.g. on Windows, would that be a stopping factor for a BUIP?)

3. Layer several high performance, imperfect MST (minimum spanning tree), "network" within this network. This is essentially a high performance block broadcast network, and transactions travel in the reverse direction. This can be done by allowing nodes to negotiate certain preferences with another node. The key one:
a) forward block/transactions before validation
b) forward block header

Quite frankly I'm a little lost on how we'd establish these MST's (or does SCTP do that for us?).
It seems like a completely different vision (I'm imagining a sort of trickling down of transactions from end nodes down to the miners, with blocks percolating / diffusing up).

Different enough from an egalitarian p2p concept IMO that it would deserve a separate analysis and separate BUIP.

4. Node "migration". A node is aware of its own capabilities and moves within the MST. Basically, a node can calculate its "weight" which is some combination of its capabilities + N * the number of MST nodes + M * number of normal nodes. If it wants to move closer to the source of blocks it talks to the node who sent it the block, asking to its parent's connection. It then attempts to connect to the parent and the parent decides based on the node's weight.

5. Cheaters/Sybil attack defeats. Pool nodes and other nodes at the top of the MST only follow node weight for half of the nodes it broadcasts new blocks to, and picks another half at random. This reduces attacker's effectiveness to that of a standard P2P network (although an attacker could disrupt the MST performance).

6. Block hop tracking for distribution optimization

7. Use approximate geo location ask known by IP address to optimize block distribution worldwide (optimize the top 1 or 2 levels of the MST tree)

8. Network performance checking. A pool node deliberately connects to some peers "listen-only", broadcasts a block or transaction, and sees when those peers receive it for different choices of broadcast children. This helps stop nodes who are cheating at 4.

Okay, these sounds like specific useful ideas for a SCTP-MST concept, not directly related to a blocktorrent implementation.

9. Use Google protobufs so that we can quickly define messages and deal robustly with new versions

This may be useful for all evolving transports implementations, so definitely to consider for a torrent-like one.

10. Pass blocks in a very efficient format, that assumes the listener has the transactions. For example, header + 32-bit hash of all contained transaction (AFAIK, this is basically what jtoomim is proposing), or subchains

Agreed, this would require careful thought to optimize it.
[doublepost=1452184397,1452183792][/doublepost]@seweso

It seems to me that IBLT + Weak blocks is far superior to this proposal. You should not do any round trips at all if you want to solve latency.

Pre-anouncing or synchronising the next block is the way forward. Any solution which starts to broadcast a found block after it has been mined can never be faster than solutions which already start before blocks are found.

I gained the impression from Jonathan's write-ups so far that a more robust p2p data distribution would be beneficial in many ways, and could - if designed right - serve as a transport layer for things like weak blocks.

So maybe IBLT + Weak Blocks could be a good subject of an orthogonal BUIP.

theZerg · Jan 7, 2016

WRT the BUIP... there are 2 types. One is "informative" where members can express their opinion (this isn't what you want -- this is more like "what should we focus on next?").

The other, an engineering BUIP, is meant to be used when there is a concrete proposal. It doesn't have to have code or all the details ironed out. But more than "we should help jtoomim with his idea" or "like bittorrent". The problem with "like bittorrent" is that this is an analogy and the analogy will be mapped differently by different people.

Finally, feel free to ignore this and just submit your BUIP. You have that right. I just think that we need more details ironed out. After submission, the Developer has 2 weeks (or more if you let him) to personally work with you on this. During that time, I would take your proposal and work with you on it specifically and mostly without other people's input, saying: "why don't we add a underlying transport plugin", or "why don't we add MST", to create a polished proposal.

But I think that right now we don't have a concrete proposal and we WANT other people's input...

Towards faster block propagation: jtoomim's blocktorrent proposal

Would you like to see a blocktorrent-like scheme proposed for BU?

yes

no

yes, but as a choice

Moderator

Active Member

Moderator

Moderator

Well-Known Member

Moderator

Moderator

New Member

Moderator

Moderator

Active Member

Moderator

Active Member

Moderator

Active Member

Moderator

Moderator

Member

Moderator

Moderator