BUIP021: Proposal for implementation of node performance testing

Inca

Moderator
Staff member
Aug 28, 2015
517
1,679
BUIP 21: Proposal for implementation of node performance testing and safe block size estimation and broadcast

Proposer: Peter Waterland
Date: 18/07/2016

Background:
A major concern raised over on-chain-scaling is the presumption that a rising blocksize will harm decentralisation of the network and lead to a falling number of non-validating nodes. No assessment of true individual node performance or means to gauge this currently exists for the bitcoin network. Armed with this knowledge it would be possible to objectively evaluate the potential performance increases the network could endure without losing nodes for purely technical reasons (rising cost excluded). If nodes on the network begin to broadcast a 'safe' block size threshold to which they can perform then miners will be confident in safely raising the block size in the future.

Proposal:

1)
This proposal is to incorporate a suite of simple benchmarking tests to the unlimited node software package to allow a specification of the node in terms of: CPU performance, memory capacity, storage space and perhaps most importantly network bandwidth capacity to be recorded.

This information can be passed across the network upon request to allow analysis of the specifications and capabilities of the nodes making up the network as a whole.

2)
Each node after running the node performance suite will be assigned a 'safe to X mb block size' value which can be broadcast to the network and is independent of other node settings. The default setting for this should be active, with a user configurable setting to disable this.

Data gathering required prior to implementation:
The team will run an experimental testnet of BU nodes at different gradations of maximum blocksize from 1 to 50mb (and varying connected numbers of nodes) to generate simulated mainnet bandwidth traffic between nodes (Xthin enabled) which will be measured.

This data will then be used along with estimations for ECDSA validations and calculated disk space for a year of blockchain growth at each maximum blocksize between 1 and 50mb to create a table of recommended node specifications for each block size greater than 1mb at 0.5mb intervals.

Detail of performance testing:

Code for benchmark testing may either be standalone to be run periodically and and generate a report for the node to use or could be merged directly into the codebase.

Fine details of testing routines to be left to developers.

CPU performance:
A testing routine should be performed which assesses the maximum number of ECDSA validations the node is able to perform per second. Fairly trivial.

Memory capacity:
This is even more straightforward. The total available memory capacity of the node (in gigabytes) to be recorded (to give a theoretical maximum mempool size).

Network assessment:
A testing routine to assess maximum available bandwidth (up and downstream) to the bitcoin network.

Storage:
Maximum available disk space to the node recorded. This can then be used to generate theoretical blockchain growth capacity in years based upon various blocksizes.

Safe for blocksize setting:
Once the above benchmarking tests have been performed for the node a value will be generated to signal to the network that the node has required capacity (CPU power, memory storage, disk space and network bandwidth) to tolerate data associated with a maximum block size up to a given value.

The node will compare the results of the benchmark tests with the table generated/supplied by the development team to select the appropriate value. A suitable safety margin will be integrated.

As this is bitcoin unlimited this will be activated by default but will be user configurable to deactivate.

Conclusion:
Decentralisation is vital to the bitcoin network. It seems obvious that some form of network assessment to identify node specifications prior to network performance upgrades should be integrated into node software. Bitcoin Unlimited has the opportunity to be the first to integrate this.

Voting:
Voting will be either a YES or NO to both integrate performance testing into BU and activate a (user configurable) 'safe for x mb blocksize' setting which is visible to the network.

Final comments please, prior to voting..

EDIT 1: Changed it to BUIP 21 :)
EDIT 2: Removed unnecessary detail, added a few bits, title and formatting..
 
Last edited:

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I like the general idea of getting more information from the network, hopefully reducing the need for third parties (looking at you, certain "do-gooder" companies) to conduct disruptive experiments which can be mistaken for attacks.
 
  • Like
Reactions: Inca

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
This is an interesting proposal. Any privacy advocates have concerns? Do we care that nodes with modified versions of BU could lie?

What is the node latency? Is that between the target and the requesting node? To a random sampling of nodes? Typically a nodes latency to random peers in a P2P network would vary greatly.

The same is true with bandwidth -- it might be difficult to determine if the bottleneck is on the sender or receiver's side...

Still the intent is clear. I'm not sure if these implementation details need to be spelled out in the BUIP unless you have particular reasons...
 

Inca

Moderator
Staff member
Aug 28, 2015
517
1,679
Thanks.

You are right the fine details of bandwidth assessment do not need to be in the BUIP.

However discussion of the method used to calculate the 'safe for x blocksize' probably should be reasonably clear to voters. I will tidy this up later today.

With regard to sybil BU nodes falsifying this information. It is possible but probably easy to detect, no? By this I mean the sudden arrival of new nodes proclaiming poor blocksize safety scores.
 
Last edited:

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
> Any privacy advocates have concerns?

Not really - as long as the gathering/publication of stats can be turned on and off, I don't see a particular problem.

Miners will factor in the probability that the information has been falsified - it would be just one source of information at their disposal. In the worst case it becomes spammed and they stop paying attention to it, leaving code which no-one uses practically.
It feels like one of those "you have to try it out to see if it will work" situations, where it may well be that useful information can be extracted.

I don't yet see how an individual node can come to a *meaningful* assessment of what block size would be safe for itself if this is really an emergent property of the network, primarily dependent on the paths from the node to well-connected miners. I think the first step would be to clarify the assumptions underlying such benchmarking of nodes, e.g. that for CPU we assume that the node has some dedicated share of processing power, and tread carefully in situations where that is not the case...
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
I like the idea of more data and opening up the black box that we're dealing with now. But I would think it's an easy attack vector as far as spinning up a bunch of fake nodes and putting out fake data to make it look like we're out of capacity and need to keep blocks smaller...There needs to be some way around that IMO.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
I like the principle of the idea and think that data collation across many nodes would give an indication of capacity limits while preserving node count.

Fake nodes with spurious data is a real risk, and would need filtering out. Perhaps this could be done using the same solution proposed to the connection slot attack, Only performance data from nodes which provide useful p2p data is collated.
 

Inca

Moderator
Staff member
Aug 28, 2015
517
1,679
Having ruminated this a bit it is a tough problem to differentiate attacker and normal nodes apart definitively in an anonymous p2p network!

Honest nodes will have exactly the same codebase dependent upon version.
Sybil nodes will have a slightly different codebase.

The only difference other than codebase I can see is that honest nodes are more likely to be older than attacker nodes and have done more legitimate work with associated costs. The more cost to the attacker, the less likely the node is to be a sybil node.

So this will either be the case of recording data over a long period and excluding newer nodes which are a certain standard deviation out of the normal distribution of the network in terms of performance specifications (perhaps incorporating what you suggest @solex). This is probably the way to go as it seems unlikely that attackers will fund a sybil attack which lasts for months at a time whilst also strengthening the BU nodeshare of the network just to alter node performance statistics slightly.

Or I wondered about implementing some code which forces the node to prove the codebase is legitimate at a certain time. Say the node broadcasts a hash of the codebase + (timestamp) the last block hash for example. But all this would prove is that the node has access to a copy of the valid codebase.

Extending this idea slightly what if we made it difficult for an attacker by forcing them to prove not just a valid codebase on the machine (hash of the executable) but also a valid entire installation including the blockchain files and the codebase together, timestamped to the latest block hash?

I cannot think of anything else at present other than node certification to a 3rd party data gathering site - which is probably against the ethos of bitcoin and not likely to be popular.

I still think we should press on regardless :) most nodes will be honest and I still think it will be easy to spot a sybil attack even if we cannot definitively differentiate sybil nodes from genuine BU nodes until we analyse the data..
 
Last edited:

Peter Tschipper

Active Member
Jan 8, 2016
254
357
Fake nodes with spurious data is a real risk, and would need filtering out. Perhaps this could be done using the same solution proposed to the connection slot attack, Only performance data from nodes which provide useful p2p data is collated.
@solex I think it is a possibility...i don't see anything stopping anyone from doing that other than time and effort.

The current crawlers at bitnodes.21.co just do a connect and then download the addr, or address list; they don't do any checking other than that. But it would be possible for miners or some other entity, even bitnodes.21 to run a few checks on each node, maybe just an initial one, and then after than just once every week or two to re-check the node. They could listen to each node for 10 minutes and download tx data and maybe exchange a block to make sure it's a real node... about 10 or 20 validating crawlers would be enough, maybe even less than that.
 
  • Like
Reactions: solex

HelloGuy

Member
Mar 16, 2016
42
20
Satoshi series clients will only serve for the near term of Bitcoin protocol. I think these data could be very useful to collect in a testing environment and publish the results for developers.