DoS Protection for UDP-Based Protocols

freetrader · Mar 2, 2016

I am opening this thread to serve as a meeting point for discussion on how to protect future UDP-based p2p communications - such as the coming Blocktorrent protocol - against DoS attacks.

I have found the following paper which discusses fragmentation-based attacks, but I am interested in other possible attacks and defenses.

https://www.eecis.udel.edu/~mills/teaching/eleg867b/dos/p2-kaufman.pdf

Any input is welcome.

jtoomim · Mar 2, 2016

Blocktorrent will very rarely send out packets that are larger than the MTU, and should not have trouble with fragmentation. Fragmentation is undesirable even in non-adversarial contexts because if one IP packet that is part of the UDP datagram gets dropped, then the whole datagram gets dropped. Thus, a datagram that requires 2 packets will have 2x the loss rate.

jl777 · Mar 2, 2016

I see two approaches: some sort of higher level connection level "authentication". assuming the packets are raw data and not encrypted, then a session based id for each UDP packet.

this shifts the problem to how to make it cost more for the attacker to get the session based id. It is a tradeoff between how much protection you want vs. how much resources it would cost for honest nodes. Assuming a few seconds can be used for PoW calcs during setup of connection between peers, then this at least provides some leveraged protection and the possibility of blacklisting, etc.

Ultimately for UDP, without a per packet PoW barrier, nothing prevents a DOS attack. If things are calibrated so that the CPU time need to calculate the PoW per packet is a bit less than the overall bandwidth throughput on the lower end nodes that are expected, then it wont affect the bandwidth, but of course be murder on CPU usage. I was getting about 100:1 leverage without noticeable speed degradations, maybe could push it to 500:1. That would mean in addition to bandwidth, the attacker would need 500x the CPU cores as compared to the nodes it is attacking

James

freetrader · Mar 2, 2016

James, thanks for your input!

I am assuming that bitcoin's core protocols will remain raw, unauthenticated and unencrypted for a while to come (though I've been known to wrong on many occasions, and I'm not terribly well-informed as to current plans in e.g. the Core / XT projects). I am confident that some sort of session id's will need to be established and used for UDP-based exchanges.

I really like your suggestion of a per packet PoW barrier. It would probably require tuning per receiver, since bandwidth availability could vary greatly among peers, and using the lower end nodes as a reference might still leave the door comfortably open to a high-powered attacker.

jl777 · Mar 2, 2016

the problem is that all peers need to use the same parameters
otherwise the attacker just claims he runs the lowest end nodes
also nonce protection consumes CPU and no bandwdith, that is what creates a leverage. The attacker would need DIFF times more calculations than the receiver.

I wrote an encrpted UDP network with onion routing and PoW protection, so this brings back memories. the CPU really is a lot faster than UDP packets so you can do a lot of PoW per packet, a surprising amount. since we are constrained spacewise where every byte counts, using a 32bit nonce for the PoW would make sense. I used the SaM hashing, but any pure CPU algo would work.

For the tuning,I would suggest a global network diff, which could be even set to being OFF, or at the lowest setting where the nonce just acts as a checksum, so then it wont even take any extra space assuming you are using a checksum already. Not sure if 16 bits is enough, but if 65536 iterations of the hash function is more than ever needed, then the checksum/nonce can be as small at 16 bits.

On testnet the network diff could be adjusted manually to calibrate the values with a typical cross section of nodes. Once the magic diffs are calibrated, then some sort of attack detection could be added to really boost the diff during times of an active attack. And actually with a lookup table, the nonce diff can be encoded into 8 bits, 256 different levels of protection seems plenty

A simple adaptive method that comes to mind would only require that each node be able to notice it is being attacked. Then it would increment the diff to the next leverage level. If it detects it is not being attacked, it downshifts all the way down to the global diff, one step at a time. not sure how frequently it should change, the goldilocks principle would apply, not too fast, not too slow

Now the dynamic changing of nonce diff breaks "contact" with all the other nodes, but if there is another bit or two allocated in the packet, it can use this to signal the other nodes that it shifted its nonce diff. Ah, yes, my problem was I changed keypairs, so lost all contact in some situations. here it is plaintext, so the diff used can just be another part of the packet header.

OK, here is quite simple protocol:
0. Get global minimum/maximum nonce diff (probably hardcoded per release, but need to be careful about backwards and forwards compatibility)

sending packets
1. add nonce and nonce diff to each packet
2. if being attacked, increase nonce diff
3. if not being attacked, decrease nonce diff

receiving
1. verify nonce satisfies diff and that diff is above global minimum and below max
2. average diff changes from peers and update attack status

Need to be careful about false positives as the above will propagate attack status from one node across the network. But this allows a way for the entire network to adaptively change the nonce diff to rather extreme levels if it is a sustained large scale attack. The larger the attack, the higher the nonce diff goes, so it would ramp up the diff until the attacker cant keep up, regardless of how much CPU he gets (subject to global max limits)

Essentially under attack, all nodes go into full red alert and each packet might take 400% CPU to retain realtime bandwidth and it would be possible to sacrifice some throughput to really boost the nonce diff

If possible, such adaptive protection should be used for all p2p comms, as currently there is basically no protection against DOS attack in any crypto, though TCP connections offer some protections, but with the additional step of establishing a valid bitcoin peer, the tcp "protection" is nothing

James

P.S. By occasionally posting info into the blockchain, all nodes using the nonce protection can coordinate even without requiring all such nodes all have paths to each other. Of course, best would be if the miner put a byte into the coinbase script or any other place that has no effect on blockchain status

freetrader · Mar 2, 2016

@jl777: Do you know if there are PoW functions whose verification can be parallelized across several CPUs? So it would make it possible to really ramp up the difficulty a lot per packet?

I've watched this talk but I haven't read up on the topic much:

http://www.irongeek.com/i.php?page=videos/bsidesboston2015/105-in-pursuit-of-a-better-crypto-puzzle-samuel-erb

jl777 · Mar 2, 2016

Since there will probably be a lot more packets than cores, not sure the advantage of a parallelizable PoW. In any case, we assume the attacker has X amount of resources, so what will create the leverage is to require disproportionate resources to generate a valid nonce.

Any decent cryptographic hash would do. Say we calculate a 32 bit crc of the packet, take the bottom 16 bits. Now require the hash to have the exact same 16 bits in the lowest bits. I think this makes it so that you need to try many hashes to find the nonce that generates a checksum hash. I am not theoretical math guy, so dont ask me for fancy equations with all the squiggly lines. I would just write the code and measure what actually happens, but if the hash function is generating truly random output, then to match 16 bits would take on the order of 65536 hashes. Maybe it is half that or double, but I guess the nonce would need to have more than 16 bits or maybe some packets wont even have a valid nonce.

And of course there needs to be the usual timestamp and other fields in the packet so that the same nonce cant be reused over and over. A simple way to guarantee uniqueness is just to increment a counter for how many packets have been sent directly to the other peer. due to no order guarantee, on the receiving side you would need to keep a moving window that buffers the counter values that have been received and reject any with the same counter. It can be a bitmap, so cost of 1 bit x max out of order depth

It would be a fair amount of work to get all this done, but it would create an adaptive self-defense shield against attackers and is probably required before anything can scale up to mass market readiness.

James
[doublepost=1456935875][/doublepost]best to use hash algo that doesnt have any ASIC and hard to use GPU, burn the actual CPU cycles
[doublepost=1456936507,1456935837][/doublepost]https://bitcointalk.org/index.php?topic=1357845.msg13919273#msg13919273

The above has a highly efficient solution to a somewhat related problem. Basically it is splitting a single dataset into N subsets and allows verification of each subset. This maps to the torrenting of a block. The problem in this context is my code only uses 100 nanoseconds per subset, so additional constraints needs to be added.

However, each value here has algebraic structure. Which would allow using cryptographic results for the DOS protection. and it uses curve25519 which doesnt have any hardware for it that I am aware of

By adding constraints to what values are allowed, you can create the leverage and the same process allows to verify data integrity of the entire file.

freetrader · Mar 2, 2016

James,

I have looked briefly at the Bitmessage code and they do something like you explained in your initial reply post, except with sha512 and minus the moving window (I believe).

Yes, definitely must be a CPU-hard function with no ASIC.

I'll need some time to digest the post you referred to, again, thanks for you valuable inputs!

jl777 · Mar 2, 2016

I am using torrents only for unchanging files, as any change in the file changes the hash used in the torrent. This happens to work great for the iguana's read-only bundle files, which indeed never change, so will allow for bandwidth saturation seeding of new nodes. It will use the mainstream bittorrent peers, so will get very large redundancy.

Not sure the big win that comes from using torrents for the blocks themselves, since the bitcoin network itself is already p2p. The INV, GETDATA, DATA sequence isnt the most elegant, but it avoids the very bad behavior most of the time.

But for a new block, there needs to be a new torrent file that needs to propagate outside the torrent network and it wont be practical to use it for new blocks where it is time critical. So the mainstream bittorrent network cant be used for this. This indicates the propagation needs to use the INV/GETDATA or something similar so the nodes know what to query.

OK, I see, with the larger blocksizes, you want to break up the fetching of each block across multiple peers. yes, that makes sense to reduce things below the 1400 size, but using UDP it works fantastic till things congest and it doesnt work at all for a while. Great care must be taken with the flow control or you end up slower than before.

What is the reasoning for UDP? The statelessness has both positives and negatives. If the goal is just very fast transfer, then iguana already shows how this can be done as it can saturate a 1gbps connection, it can only process all the data at about half that speed even with 8 cores. I think the same effect as torrent would be able to be implemented without all the UDP headaches if a new GETDATA msg is added, one that can get a range of the block. I guess GETRANGE where you ask for (txid, startoffset, length) instead of just txid.

Do you know the status of the blocktorrenting implemetation? I could add GETRANGE support to iguana this weekend, if the blocktorrenting is not done yet

freetrader · Mar 2, 2016

Blocktorrent is in prototype development stage.
Perhaps we should move further discussion of how best to torrent block data into a separate thread since we're getting slightly off the DoS topic?

jl777 · Mar 2, 2016

is there already a thread on blocktorrent?

jtoomim · Mar 2, 2016

PoW on connection creation does not sound like a bad idea.

PoW for each packet sounds like a bad idea. Blocktorrent needs to not use much CPU to run.

I would really rather spend time implementing the basic algorithm right now than discussing ways in which this protocol could get attacked that are not unique to this protocol.

Blocktorrent does not work by sending ranges of the block. It works by splitting up the block using the merkle tree, and by sending parts of the merkle tree. This allows for the reuse of transaction data that's already in mempool, and it allows for verification of each chunk of data.

jl777 · Mar 2, 2016

yes, sending small groups of txdata makes sense. then regardless of which ones get mined into a block, the new block can be transmitted just with the tx a node doesnt already have. though not sure how to find the exactly list of tx that a node doesnt have in an efficient way using the existing protocol. Since the txids arent explicitly included in the rawdata, a new message that returns all the txid in a block might be needed

jtoomim · Mar 2, 2016

Blocktorrent is an entirely new protocol. It does not use the existing p2p protocol. It is 100% new messages.

jl777 · Mar 2, 2016

i understand. just thinking about doing a similar approach using existing protocol, with as few changes as possible. Would be interesting to see the performance difference between a tcp vs udp

do you have any estimates as to effective throughput the udp blocktorrent would achieve?

jtoomim · Mar 2, 2016

The performance difference for UDP vs TCP is huge for short bursts of activity over high-latency links.

With TCP, the congestion window size doubles every round-trip time, so it takes several multiples of the round-trip time before full bandwidth can be used. For block transfers around 1 MB, this means that the full bandwidth is usually never achieved before the transfer is over.

Also, TCP takes any packet loss as a sign of congestion, and will decrease the CWND size (and slow down the transfer). The GFW injects packet loss at a rate between 1% and 50% (depending on the route and the whims of the GFW servers at the time), which totally trashes TCP bandwidth. This packet loss rate is not dependent on the actual traffic volume, so the best response to it is to simply send more packets and send them at a higher rate, and add some redundancy to the data transfer.

With a decent UDP algorithm, you can get 90 Mbit/s of actual data transfer across the GFW between two computers with 100 Mbit/s connections with no ramp-up time, whereas with TCP you often will get transfer rates as low as 100 kbit/s. Both of those numbers are the result of actual benchmarks that I have either done myself or witnessed first-hand, by the way.

jl777 · Mar 2, 2016

interesting results. I am seeing about 100KB/sec to 1MB/sec per peer during initial sync when there are many requests to be overlapped, but toward the end it sure slows to a crawl probably due to what you observed.

I am doing a totally new bitcoin core from scratch and would be interested in using the blocktorrent for the realtime blocks.

jtoomim · Mar 3, 2016

@jl777: blocktorrent thread is here: https://bitco.in/forum/threads/buip006-blocktorrent-–-a-torrent-style-block-data-transport.747/

DoS Protection for UDP-Based Protocols

Moderator

Active Member

Active Member

Moderator

Active Member

Moderator

Active Member

Moderator

Active Member

Moderator

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member