Connection Slot Exhaustion attack?

Peter Tschipper

Active Member
Jan 8, 2016
254
357
There seems to be a bit of a connection slot attack going on with bitcoinj13.0 clients connecting and then doing nothing but ping/ponging. It's easy enough to prevent. Just go to your peers tab and locate the bitcoinj13.0 clients that are coming from the same network ( mine are all coming from 37.97.164.xxx ) and ban them for a year.

Once your slots are full you node is no longer registered as running on bitnodes even though it's up and running. This might give us back a few nodes...seems that classic nodes are being attacked as well.
------------

Still ongoing today...seems to be stealthy. There was a similar one about 7 or 8 weeks ago which was more obvious. They've discovered my other nodes but I banned them there as well. Have to keep an eye on it.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
I'd like to start thinking about putting in some DOS mitigation for this kind of thing. There are a couple of approaches that come to mind. One is to limit the number of connections that have the same network prefix. So in this case only one of the 37.97.164.xxx connections would be possible. We would need some kind of config setting to override that in order to to regression testing and also for anyone who had some other reason to override that. The other approach would be to look at what a connection is doing. So if a peer connects, does the handshake, exchanges versions and does a ping/pong, but then doesn't do anything in between the next round of ping/pongs then we could disconnect and ban those peers for a few hours or so.

Any thoughts, ideas, opinions?
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
Found a couple of those on my node, this sorts them out from the command line (needs 0.12+):
Code:
bitcoin-cli setban "37.97.164.0/24" "add" 31536000
(the last argument is the number of seconds in 365 days)

Perhaps it's worth taking another look at XT's anti-DOS patches, to see if those could be built on.
This patch set introduces code that runs when a node is full and otherwise could not accept new connections. It labels and prioritises connections according to lists of IP ranges: if a high priority IP address connects and the node is full, it will disconnect a lower priority connection to make room. Currently Tor exits are labelled as being lower priority than regular IP addresses, as jamming attacks via Tor have been observed, and most users/merchants don't use it. In normal operation this new code will never run. If someone performs a DoS attack via Tor, then legitimate Tor users will get the existing behaviour of being unable to connect, but mobile and home users will still be able to use the network without disruption.
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@freetrader that's pretty interesting about the XT's anti DOS patches...I'll have to look at those....maybe we could implement them and use them as a basis for further improvement.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Thanks for highlighting this @Peter Tschipper
FWIW i put 37.97.164.0 into https://www.iplocation.net/ and it is located in the Netherlands.

I like the idea of limiting connections by prefix, unless a testing override argument is used.
 

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
How about we map each ip into an n dimensional geometry (say 2) and pick this mapping function to generally correlate distance with the physical reality. Then if an incoming connection increases the volume that contains all the existing connections minus one of them, we replace.

The above is a pretty abstract description but the implementation should be pretty simple.

The advantage wrt your proposal is it doesn't make subnet assumptions and we the mapping function could evolve into something with a small amount of real geo IP data.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@theZerg i'm not sure i follow...although the geo location would be of benefit for sure as we could target other kind of attacks as well.

I like what dagurval is doing with https://github.com/bitcoinxt/bitcoinxt/pull/146

In core we have:

static bool AttemptToEvictConnection(bool fPreferNewConnection)

which just doesn't work very well and i think just needs to be cleaned up so as to prioritize the connections better. For instance they filter on ping time and assume that if you have a fast ping that you're not an attacker whereas the exact opposite is true.
 

jl777

Active Member
Feb 26, 2016
279
345
what I do with iguana is to rank peers using a relatively simple metric, ie amount of useful data received.

Then every 6 seconds or so, I rank all the peers and prune the one(s) with the worst metric. I dont prune below 64 peers to prevent exhaustion of all peers for whatever reason

Over time, only the peers that are useful are retained and while the ping/pong peer could make a connection using up a slot, they wont last long. In my peer selection, I choose the peer who is more likely to be useful by selecting peers that have been useful in the past and not recently killed off.

Also, I partition the slots available for incoming peers vs outgoing peers and dont rely much on incoming peers as that cant be controlled. If they come in and are ranked well, then fine.

Code:
double iguana_metric(struct iguana_peer *addr,uint32_t now,double decay)
{
    int32_t duration; double metric = addr->recvblocks * addr->recvtotal;
    addr->recvblocks *= decay;
    addr->recvtotal *= decay;
    if ( now >= addr->ready && addr->ready != 0 )
        duration = (now - addr->ready + 1);
    else duration = 1;
    if ( metric < SMALLVAL && duration > 300 )
        metric = 0.001;
    else metric /= duration;
    return(metric);
}
decay is set to values close to 1.0, ie 0.995, this way nodes are ranked based on their lifetime performance, with more weight given to more recent values. I used to reset the addr->recv* fields after each metric calculation, but that was too choppy. Using the connection's lifetime average valid data recieved (factored by blocks) per second creates a stable ranking
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
The Core12.0 has the framework in there for priorititzing connections however they were making some not so good assumptions about which connections to protect , ending up protecting the attackers instead of the good connections. I commented out those assumptions and added in one small section and now it all works very good. There is a current attack underway and the attackers are allowed to connect but as soon as another node wants in then one of the attack nodes drops off. Also included here are connections that are stale and never do an initial ping/pong, they get dropped also.

https://github.com/BitcoinUnlimited/BitcoinUnlimited/pull/34

One other good thing here , similar to @dagurval 's approach is that we end up turning the tables and using up the attackers resources since we only disconnect them if our slots are exhausted.
 

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
@Peter Tschipper your patch looks like a good start to remove poor thinking in Core.

@Peter Tschipper & @jl777
We actually have orthogonal selection criteria here:
1. Nodes that are from the broadest selection area
2. Nodes that are helpful to us.
3. shortest ping time (this is what core seems to be using the ptschip removed)

The desirability of 3 is very questionable. We actually want Bitcoin to be a scaleless network (https://en.wikipedia.org/wiki/Scale-free_network) since that kind of network has lots of desirable properties. So what we really would want to do is have every node select peer nodes in a way that encourages the larger network to be scale free. To do that we simple need to make OUR node scaleless. Basically, if we took a random node X and graphed the number nodes at every distance, we'd want an inverse exponential (https://www.wolframalpha.com/input/?i=e^-x).

So if we can build a function into BU that takes 2 IP addresses and returns a number that loosely corresponds to the "distance" between these addresses (maybe some combination of IP to geo, IP bit distance, and ping time), we can use it to force our connection list to model an inverse exponential by dropping the node that causes our graph to deviate the most from the ideal. This will create a scaleless network, to the accuracy of our distance function.

But as a practical matter, it does make sense deviate from the ideal inverse exponential (i.e. special case) the situation where multiple connections are coming from the same IP address since these could be a Sybil attack. And sending the same data to these nodes has dubious value since they no doubt have a very fast connection between them.
 
  • Like
Reactions: solex

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Can we consider this improvement able to be logically bundled with either Xthin or Datastream Compression, or requiring a separate BUIP?
 

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
seems like a bug fix to me -- and of course if anyone disagrees they can create a BUIP to give the change visibility...
 
  • Like
Reactions: solex and sickpig

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@jl777 I like the idea of further ranking peers based on their activity, not necessarily disconnect them, but give them a low or lower priority in case a slot is needed. So a node that's just sitting there doing nothing for some period of time, maybe longer that the block download timeout, will still stay connected unless another node wants to connect.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
@Peter Tschipper : you mentioned that this attack was ongoing still, right?

On nodecounter.com the BU node count seems to have only decreased gradually and slightly - around 30% over the last month. The attack seems much more noticeable on the Classic nodes, where there is still irregularity after the initial recovery a day or two back.

How did you measure the efficacy of your proposed patch?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@freetrader for me it was easy enough to test since i have several of the attackers nodes. So I could just watch as my slots fill up and record the behavior. So I set my maxconnections to 20 and start out with about 7 or 8 attackers nodes . Then as I reached 20 i could watch the new connections come in and for each new connection an attackers peer would be disconnected leaving finally just 1 attackers peer connected. Also, as peers diconnected i could watch the slots fill up again with attackers nodes pretty quickly. It works well and allows the bitnodes crawlers to connect and do their work in recording the presence of our nodes on the network.

Also, you can watch the peers that have a ping of N/A. These are new connections that don't for whatever reason perform or complete the initial ping after connecting. If they don't complete the ping in 60 seconds after connecting then they will get bumped off, but only if the maxconnections is exceeded. I tend to see a few of those stale connections and I also watched as they would get disconnected in the same way as the attackers nodes.
 
  • Like
Reactions: freetrader

jl777

Active Member
Feb 26, 2016
279
345
@jl777 I like the idea of further ranking peers based on their activity, not necessarily disconnect them, but give them a low or lower priority in case a slot is needed. So a node that's just sitting there doing nothing for some period of time, maybe longer that the block download timeout, will still stay connected unless another node wants to connect.
yes, the least active nodes just go to the bottom of the rankings, so if the peer slots are running low, they get pruned to make room.

I just reserve 1/8th of the slots to be available and if there are more than 7/8ths of the slots full, close the connection to the worst ranked peer. This allows new peers to connect and climb the ranks to safety if they are useful, or just stay at the bottom to be pruned.

seems to work pretty well as far as getting max bandwidth
 
  • Like
Reactions: freetrader

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
The attack has stopped on my node at 11:59:27 UTC today (29 April), after I sent a message to abuse@transip.nl .

The response/explanation I received from their abuse department:
Dear Sir, Madam,

I've received reply from our customer and they've explained the behavior of their server as follows:

===================================
Yesterday we received such a notice (the first time). Our company researches block chain technology as part of a final project.

We immediately investigated the amount of connections that we make to a Bitcoin node. The research shows that we tend to try to make a connection, but we had once up to 25 connections to the same node (where a node generally affects 125 connections). This has been corrected so that it can no longer occur.
We will also ensure that we do not quickly try to create a new connection when a connection is broken. Quickly re-connecting is certainly not attack, but can be misinterpreted if a connection is broken deliberately.
==========================

We assume the issue has been revolved but should you have any troubles, do not hesitate to contact us.

Should you have any more questions, please do not hesitate to ask and I wish you a nice day.

Met vriendelijke groet,
<name withheld for privacy>
TransIP BV
 
  • Like
Reactions: Peter Tschipper

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
Actually, it seems they have now resumed, but with a ~2 minute interval between their retrying a batch of connections. I get about 5 connection attempts per such batch.

I think this company / researcher should come forward and explain what they are doing.

For one, I've seen no Core node operators complain about this, so it seems targeted at a subset of the network.

Secondly, the reasoning they gave to their ISP is obviously avoiding the issue - if they do establish connections they seem to sit idly on them, wasting precious slots.

They should own up to their behaviour and explain to the community what it is they are trying to research.
Until then, I'll continue to ban them and I would suggest everyone does the same.

FYI: The precise list of IPv4's that seem to be involved (corroborated by Tom Zander's logs):

37.97.164.159
37.97.164.160
37.97.164.230
37.97.164.231
37.97.164.232
37.97.164.233
37.97.164.234
37.97.164.237
37.97.164.238
37.97.164.239

Tom also mentions the IPV6 address 2a01:7c8:aac0:5ee:c82e:a215:7488:e484 , though I have not found that one in my logs.
 
Last edited:
  • Like
Reactions: solex

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@freetrader It's obvious to me they are targeting Classic nodes and we're getting thrown into the mix and probably BitcoinXT as well. Why else would there be no effect on Core nodes.

Also, as far as research is concerned, what are they researching....how to bring down the network? All they are doing is connecting and then ping/ponging.

All said, I generally like these kind of attacks because it allows us to fix the holes now rather than only theoretically. It only makes us stronger in the long run.

BTW, looks like they've upgraded to bitcoinj 13.6