BUIP Proposal: Formalization of Stress Tests

priestc

Member
Nov 19, 2015
94
191
Over the past year or so there has been a few "stress tests" that went on. Personally, I believe doing these are a good idea. The results that come from the livenet stress tests are public and therefore more believable as valid. The problem is the start time and end times of these stress tests are poorly defined. I don't even know how (or by who) the previous stress tests were planned. Stress tests are most effective when 100% of the stress traffic comes at the same time. If the traffic is spread out over a couple hours or days then it fails to expose bottlenecks as effectively as if the stress test comes. In order to get the entire community to spam the network at the precise same time, some coordination is in order. The purpose of this BUIP is to define that coordination.

Below is a unordered list of points that this BUIP should elaborate:

  • BCH is not beta software. Real economic activity exists on the system today. Therefore it is important that these stress tests are designed to not disrupt the ability to be used as money.
  • The stress tests will happen every Monday. The spam phase starts at exactly Noon GMT. The spam phase ends at 12:07 PM GMT.
  • The very next block will be the "stress block".
  • The very next block after the stress block will be a normal block and will signal the end of the stress test.
  • The first actual stress block will happen the first monday after this BUIP gets passed by the BU vote committee.
  • Each test will have one of three outcomes: Passed, Failed, or Mistest.
  • Passed is if the network makes a block bigger than the previous "stress block" from the week before.
  • Failed is if the network tried to make a block big enough to clear out the mempool entirely, but that block failed to propagate (in other words multiple orphan blocks were created during the test)
  • Mistest occurs when the people who are supposed to spam the network to make a bigger block fail to fill up the mempool and the stress block is not bigger than the previous stress block from the week before.
  • Multiple implementations of a BUIP-compatible spam script can exist that is programmed to spam during the 7-minute window. Such a tool already mostly exists. Existing stress test software would just have to be modified to conform to the schedule as defined in this BUIP.
  • The spam script will make transactions with zero fees. Nodes and miners opt-in to the stress test by lowering their minimum relay fee to zero for the spam period of the stress test (the first 7 minutes).
  • Miners that want to not participate in the stress test can just ignore all zero fee transactions during the spam period and only make a block with tx's that pay a fee. This will likely fail the stress test, but this is fine because miners have that right, and there will just be another one next week anyways.
Other thoughts:
  • Should the test be once a week, or maybe once every other week? It
  • The purpose of all of this is to see the maximum size a block can get before it results in a "failure". That failure point should be the block size limit. Once the stress test shows that a stress block can be bigger than the limit, then new stress block size will be the protocol block size limit.
  • These tests should happen on a regular schedule for the rest of time.
  • Passed, Failed and Mistest are probably not enough to describe all possible outcomes from these tests. Maybe other states can be added?
  • Maybe "stress test" is not the best way to frame such a process... Maybe another name is needed? Perhaps "Capacity Test" or "Network Performance Test"...
 
Last edited:
  • Like
Reactions: AdrianX

wrstuv31

Member
Nov 26, 2017
76
208
"That failure point should be the block size limit."

Why?

It makes more sense to say that the block size limit should be moved well above (10x) that failure point.
 

priestc

Member
Nov 19, 2015
94
191
Because "failure" is defined as "orphans galore". If orphans galore occurs, then it's bad for the network and proof that the block that caused orphans galore was too big. "Orphans galore" is bad because it creates a condition that makes it easier to double spend a zero confirm transaction.

The problem is that it's not possible today for a node to know the overall network orphan rate. Not every node knows about every orphan that occurs. Its possible an orphan block can occur without a node ever knowing it existed. As I understand it, a node only knows about an orphan block when it has to revert that orphan block because it found a block with more POW... Before the results of the stress test can be used to construct the blocksize consensus rule, there needs to be "orphan proofs" or "orphan alerts" or something like that...
 

wrstuv31

Member
Nov 26, 2017
76
208
"Because "failure" is defined as "orphans galore"."

Can you describe how you imagine the failure happening, and why this means that the blocksize limit should be set at the point of failure defined by you.

It's a bit different if the txs are no-fee, vs fee-paying.
 

priestc

Member
Nov 19, 2015
94
191
"Can you describe how you imagine the failure happening, and why this means that the blocksize limit should be set at the point of failure defined by you"

Just like how the Supreme Court says about pornography "We can't define exactly what it is and is not, but we know it when we see it", I feel the same way about blocks being too big. I don't know what it'll exactly look like, but we'll all know it when we see it. My best guess the easiest way to detect failure is lots of orphan blocks. Another thing we might see is long wait until the next block is found. We may see something that was never expected to find.... The tests have to be performed to observe what happens.

The purpose of the test result is to make it easier for non technical holders to follow what is going on with these tests.

"It's a bit different if the txs are no-fee, vs fee-paying."

There are three blocksize "limits"

1. The theoretical limit (aka "hard cap"): currently 32MB
2. The observed limit (aka "soft cap"): currently 22MB
3. The economic limit (average size of all non-stress blocks): currently 50KB or something like that...

The economic limit is not what the stress test is meant to test. Its meant to test the observed limit.
 

solex

Moderator
Staff member
Aug 22, 2015
1,558
4,693
Interesting thoughts in the OP @priestc.

My view is that high-frequency stress tests will only serve to drill-down to determine what block size is best to split the mining network into 50/50, i.e. a "network-split attack". This is probably not a good number to have well-known as it just makes the work easier for a malicious actor. Fortunately, it is probably higher than 32MB, though less than 128MB. This is why the 32MB default limit is about as high as BCH should go until a lot of the GTI improvements and other optimisations are in live.

The stress-test which created the 23MB block proved that massive spare capacity exists and that is a lot of what businesses need to know for capacity planning.

Frequent stress tests will just prove that nothing much has changed in a week or two. Having them every 6 months makes sense as we should expect some overall improvement in that timescale.
 

priestc

Member
Nov 19, 2015
94
191
My view is that high-frequency stress tests will only serve to drill-down to determine what block size is best to split the mining network into 50/50, i.e. a "network-split attack".
That is correct! I think it's better to know than it is to not know. I doubt miners will use this information to be malicious. Even if they do, they could have gotten the number by doing the testing on a private network anyways.

Having them every 6 months makes sense as we should expect some overall improvement in that timescale.
Doing the test infrequently give the advantage of having enough time to fix problems that arise. If the tests happen weekly, then you only have one week to fix a problem, if the next test is in 6 months, you have 6 months to fix the problem, etc.

The downside is it will take longer to see trends if the tests are only making 2 stress blocks per year. The core developers think "centralization" will occur when blocks get too big. It'll take years to see that trend develop if only one stress block happens every 6 months...

It seems people that have since commented on this proposal prefer less frequently than weekly. I have yet to see someone suggest daily or hourly or anything like that. I think the time of the stress test should be easy to remember. "every Monday" is easy to remember, as is "the first Friday of each month". Its harder to remember "January 1st and July 1st" or "January 1st, May1st, and Sept 1st"

There is also the difficulty adjustment that happens every 2 weeks. Maybe the stress test can be defined as 10 blocks before the each difficulty adjustment. Or maybe it can be derived from the halving block... I think it's probably better to have it defined based on a calendar date and time, rather than a block number, because human coordination is the goal and that will be easier with "human" dates and times.
 

imaginary_username

Active Member
Aug 19, 2015
101
174
@priestc First, the difficulty adjustment on BCH is not two weeks, it's a continuous 144-block rolling window. Second, I think these resources would be much better used to set up bigger, more varied and more quantifiable testnet (above and beyond the infrastructure in Gigabit testnet), which can be freely monitored and tweaked, as well as used for other protocol evaluations. Dumping money into spamming mainnet where data collection is difficult while inconveniencing everyone is hardly the best use of money.
 
Last edited:

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I second what @solex and @imaginary_username said.

Further: the results of the stress tests basically "proved" what we already knew from the gigablock tests: that there was a bottleneck due to mempool acceptance at around 100 transactions per second. The performance of the real network mirrored the gigablock results surprisingly well, with the real network performing a bit worse, as we expected.
 
  • Like
Reactions: Wecx

priestc

Member
Nov 19, 2015
94
191
Dumping money into spamming mainnet
It doesn't cost anyone money. Spammers can spam for free, and nodes by default opt-out of zero fee transactions. So there is no extra cost on anyone unless they opt-in.

where data collection is difficult
Why is data collection on the live net "difficult"? The way I see it, it's the exact opposite. I (or anyone else) can *very* easily get performance data from multiple sources regarding the live net.

while inconveniencing everyone is hardly the best use of money.
Who is inconvenienced? The tests lasts one block. Its over and done with in 10 minutes. If the block interval is lowered to some interval less than 10 minutes, it'll have even less impact on users.

@Peter R I don't know much about the Gigablock tests, but they are only valid if they are fresh. A few weeks ago when the 22MB number was released, it may have been accurate, but today it might not be. 10 years from not it definitely won't be. All benchmark tests on a network like Bitcoin has to be continuously performed, because the network allows anyone new to join and those new people may not have the same hardware as the people already there.

I consider the private test network used by the gigablock tests as a scientific instrument. Like all scientific instruments, they need to be regularly calibrated. The livenet stress blocks in the OP sounds like a good way to calibrate the private network to make sure it's giving accurate results.

Also, you can't guarantee that miners on livenet are using the same software you used in your gigablock tests. Maybe a mining pool hired a developer to make a custom BCH implementation that can accept transactions into the mempool faster than 22MB/10min? If a miner publishes a stress block bigger than 22MB, that will have to be the conclusion. Knowing this is better than not knowing.
 

imaginary_username

Active Member
Aug 19, 2015
101
174
>Spammers can spam for free

Relying on miner opt-in for that test is very dubious, it's asking them to raise their orphan probability for PR. They'll also have to make their node IP known. If we use only zero-fees as you proposed the test will likely be very ineffective and pointless - even implementations that have zero-fee relay have rate-limiters that will prevent this from going through.

>Why is data collection on the live net "difficult"?

It's difficult because you can't get the topology of the network, especially when you only have cooperation from a small fraction of the nodes, less mining nodes. Without topology it's just a dick-measuring contest. Miners can already publish arbitrarily large blocks within consensus if they use a pre-built block anyway.

>Who is inconvenienced? The tests lasts one block. Its over and done with in 10 minutes. If the block interval is lowered to some interval less than 10 minutes, it'll have even less impact on users.

Everyone who would like to keep mempools synced are inconvenienced.

Note that you can already do all this without doing a BUIP if you think it's a good idea - just go ahead. But I personally don't think BU should spend any effort or money on this.
 

Griffith

Active Member
Jun 5, 2017
188
157
@priestc
I don't know much about the Gigablock tests, but they are only valid if they are fresh
I'm not sure if i follow your logic here. If we did tests on the giganet which showed us software limitations and then improved on the software, why does time discredit those findings? If anything our limits on what we can do have gone up, not down. It would make a lot more sense to do additional testing on the gigablock testnet rather than the mainnet.

EDIT: for reference the gigablock testnet did mine a 1GB block. it took roughly 10 minutes to propagate throughout the network and demonstrated the (at the time) theoretical limit of block size
 
  • Like
Reactions: imaginary_username

Wecx

New Member
Apr 13, 2018
15
48
I second what @solex and @imaginary_username said.

Further: the results of the stress tests basically "proved" what we already knew from the gigablock tests: that there was a bottleneck due to mempool acceptance at around 100 transactions per second. The performance of the real network mirrored the gigablock results surprisingly well, with the real network performing a bit worse, as we expected.
I actually tell people who bring up the stress test that it mirrored the gigablock results. The gigablock testnet is a real national treasure imo.
 
  • Like
Reactions: imaginary_username