Connection Slot Exhaustion attack?

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
It seems obvious to me to, but requires double checking.

I'm also waiting for those researchers to contact me by email (they have it), which if they are sincere then they will have no problems doing.
 

jl777

Active Member
Feb 26, 2016
279
345
All said, I generally like these kind of attacks because it allows us to fix the holes now rather than only theoretically. It only makes us stronger in the long run.
currently no crypto network can withstand any sustained attack, especially if it is at the higher level protocols.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I'm also waiting for those researchers to contact me by email (they have it), which if they are sincere then they will have no problems doing.
Just to update, the researchers did get in contact with me, and have said they plan to "soon put up a http server on our nodes explaining their intent".

The information I received leads me to believe they are genuinely not trying to attack the network, but their software had flaws. They have already identified some of these, e.g.

- there could multiple connections from the same IP address
- it would connect to different ports on the same node
- another one had their node retrying connections too rapidly

They state they have switched to an exponential back-off, which seems to have worked well to mitigate the effects of their experiment on Classic/BU nodes.

I encourage everyone to keep monitoring the situation, and to try to distinguish whether there are other attacks in progress.

Also, the researchers in this experiment state they are only using these 10 specific IPs:

37.97.164.{159,160,230,231,232,233,234,237,238,239}
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
@Peter Tschipper : That credit is due to Tom Zander of Classic who notified folks of the network provider and their abuse address. I just contacted the provider who put me in touch.

I guess I don't often have faith that these mechanisms work, but the Network Coordination Centres do provide good information on who's responsible for certain IP ranges. In this case the information could be found through https://apps.db.ripe.net/search/query.html .

Good on the provider (transip.nl) for having a competent and professional abuse department.
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@jl777

decay is set to values close to 1.0, ie 0.995, this way nodes are ranked based on their lifetime performance, with more weight given to more recent values. I used to reset the addr->recv* fields after each metric calculation, but that was too choppy. Using the connection's lifetime average valid data recieved (factored by blocks) per second creates a stable ranking
With the recent ongoing problems with connections and now the adaptive behavior of those doing the attack (or study of some kind) it's clear that just de-prioritizing IP groups isn't going to be enough moving forward. I like the method you have of determining if a node is useful, ranking them by traffic. I'm wondering though if you've tried ranking the traffic over the last 10 or 20 minutes rather than the whole lifetime, that way a node that starts to have a problem or is hung can get de-prioritized as well. It's easy enough to decay the values over a short time period as is used in -limitfreerelay but I wanted to know if you tried going down that path already and what your thoughts are in that regard.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@jl777

actually come to think of it, to decay the activity over 20 minutes could just become another attack vector as an attacker just has to be very active for 20 minutes to start bumping off every other node. I think, by following your solution, by using the overall activity of the node from the beginning, without any decay would make the connections stable and not subject to being bumped off. If, as you say, historically a node has been useful then it should maintain it's priority! Then it becomes a simple matter of just verifying that the useful nodes have not become completely stale and inactive which would then be the only justification for bumping them off or de-prioritizing them.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@jl777

I update PR34 [WIP] to also de-prioritize by total bytes sent/received. I tried this with maxconnections set to 20 and then unbanned all my banned nodes. It works very well, and kind of gave me a chuckle to see the constant influx of attackers nodes bumping each other off leaving the useful nodes running. Meanwhile when i went to check my node status at bitnodes, i find my node is up and running!

If anybody wants to run this and try it out let me know what you think..PR#34

I think one other thing that might be useful is to not just count total bytes sent/received but to leave out the counting outbound INV messages and ping/pongs since that is basically just a node that is connected and listening to INV traffic and not providing any useful data or doing any getdata. Just to be clear though, we're not disconnecting the nodes but just de-prioritizing them in the event the connection slots get filled.
 
  • Like
Reactions: freetrader

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
@Peter Tschipper
I like this improvement too.
I update PR34 [WIP] to also de-prioritize by total bytes sent/received.
Also, can this be refined further such that valid data is counted only? i.e. tx which persist in the mempool and blocks which have valid PoW etc and can persist in the blockchain (whether later orphaned or not)?
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@solex I also added in NOT counting outgoing INV's, and also not counting VERACK, VERSION, PING and PONG going both in and outbound. Tested it today and it works very well IMO.

Currently i left the extra logging in the code so if anybody runs it you can see what the byte counts are for each node and also which one get's bumped when the slots are full.
 

jl777

Active Member
Feb 26, 2016
279
345
I suggest you only count useful bandwidth, ie new blockdata and new blockheader data. A node can send you the genesis block a zillion times, but it is useless bandwidth. What is most important is the first node that sends you the useful data. If you want to get fancy, maintain a reference count for the valid items and the cumulative value is boosted by Nbytes/refcount.

This has the benefit of stability and giving partial credits to the nodes that are a bit slow. But if a node is constantly the tenth node to send you useful data (slow connection), then it should drop in priority relative to a fast node that has sent as much data.

The nodes will then cluster with the fast low latency and long term reliable nodes at the top, slow and inconsistent (or brand new) nodes at the bottom. This data might be useful to be retained across sessions, but would need to decay it at the start of each session as if a node is up for 3 years, then 3 year old peer isnt really useful.

I dont suggest to use raw bandwidth, as that has a lot of things that dont matter and if using the size of the logical data received, then nodes that are using compression will get the same priority as nodes that dont. Actually right now they would get lower as usually they would be a bit slower.
 
  • Like
Reactions: freetrader

Peter Tschipper

Active Member
Jan 8, 2016
254
357
@jl777 Although our current solution to evict the least active node is working fine, I'm starting to implement the decay of the activity bytes over a period of time as well. I think you said you had implemented that in Iguana, and I was wondering what period of time you found to work best. I took a guess at 12 hours but not sure if something shorter would be better, maybe 6 or 4 hours would be more appropriate. Any experiences you can share would be helpful....thanks.
 

jl777

Active Member
Feb 26, 2016
279
345
With iguana sync speed is the highest priority, so I want to phase out slower peers sooner, rather than later and promote fast nodes.

Currently I just >>=1 (divide by 2) all existing activity totals each time I rank the peers every 5 to 10 minutes. So after several rankings it can end up with a dramatically different set of nodes, but what I am seeing is that fast nodes tend to stay fast and slow nodes tend to stay slow. You could decay by 10% each time by having a decay factor of 0.9, it doesnt seem to be too sensitive to the exact parameters.

This might seem very fast, but iguana syncs the BTC chain in about an hour, so I matched the decay to the download time.

Of course on slower bandwidth systems, it probably makes sense to decay a bit slower, but I am not seeing much harm in the fast phaseout/phasein as it does spread the load out to different nodes.
 

Peter Tschipper

Active Member
Jan 8, 2016
254
357
thanks for the reply...sounds like that'll be a little fast for us, but I might try 2 to 4 hours...I'll experiment with that...thanks again.
 

jl777

Active Member
Feb 26, 2016
279
345
anytime!
probably calibrating things so the vast majority of peer performance changes are reflected after half the blockchain is downloaded is a good balance