getting the code to work with all blocks is a pretty good set of test vectors. Currently I have all the script encoding mostly working for the second pass, and working using malloc with the first pass. I used to have some fixed space big enough for most scripts, with an overflow file to deal with the few percent of cases.
The problem is even a few percent of cases is a lot of cases when there are hundreds of millions of scripts!
So it created a performance bottleneck and I decided to switch to memory allocation. And since I need to use memory allocation, might as well do it for all cases, even the small ones.It is easier to debug things with a known working malloc first. Then I will change it to use a single large memory buffer for all the mallocs. That not only speeds things up a lot by eliminated system call, it also makes the freeing of the memory costless
It is doing better on the speed side, but I have to exclude about 5% of bundles as they have scripts that confuse things. Still, it is working 99%+ of the time, so just need to track these cases down and change the malloc to use my memalloc.
N[201] Q.140 h.401663 r.170426 c.0:112000:0 s.170439 d.57 E.56:171050 M.401662 L.401663 est.8 262.8MB 0:04:20 4.097 peers.42/133 Q.(19822 60)
the above is a status line after 4 minutes 20 seconds. it got all 201 headers, sync'ed and saved 170439 blocks, saved 57 bundles and validated 401662 blockheaders so it has the hashes for the entire blockchain. It is sustaining around 40megabytes/sec, which is half the max speed, but since the bandwidth wasnt the bottleneck, I shifted the processing around to create more CPU time for the final signature validation. It used about 300% CPU during the above, but it is still mostly the early blockchain so not many tx per block in those days
The early blocks are very small, so the bandwidth needed is much lower, but the sig validation and utxo vectors needs all prior blocks. Having several minutes of wasted bandwidth is probably worth being able to start the serial sweep that much sooner. I can make it use a lot more bandwidth during the entire time, but CPU usage goes up and it takes all 8 cores to process 100MB/sec. So the final tuning will be a careful balancing of the CPU allocated for sig validation and the CPU for processing the bandwidth.
I think I can dynamically change the CPU to the two by estimating the time to completion for each and have a feedback mechanism to get them to finish as close to each other at the same time.
The problem is even a few percent of cases is a lot of cases when there are hundreds of millions of scripts!
So it created a performance bottleneck and I decided to switch to memory allocation. And since I need to use memory allocation, might as well do it for all cases, even the small ones.It is easier to debug things with a known working malloc first. Then I will change it to use a single large memory buffer for all the mallocs. That not only speeds things up a lot by eliminated system call, it also makes the freeing of the memory costless
It is doing better on the speed side, but I have to exclude about 5% of bundles as they have scripts that confuse things. Still, it is working 99%+ of the time, so just need to track these cases down and change the malloc to use my memalloc.
N[201] Q.140 h.401663 r.170426 c.0:112000:0 s.170439 d.57 E.56:171050 M.401662 L.401663 est.8 262.8MB 0:04:20 4.097 peers.42/133 Q.(19822 60)
the above is a status line after 4 minutes 20 seconds. it got all 201 headers, sync'ed and saved 170439 blocks, saved 57 bundles and validated 401662 blockheaders so it has the hashes for the entire blockchain. It is sustaining around 40megabytes/sec, which is half the max speed, but since the bandwidth wasnt the bottleneck, I shifted the processing around to create more CPU time for the final signature validation. It used about 300% CPU during the above, but it is still mostly the early blockchain so not many tx per block in those days
The early blocks are very small, so the bandwidth needed is much lower, but the sig validation and utxo vectors needs all prior blocks. Having several minutes of wasted bandwidth is probably worth being able to start the serial sweep that much sooner. I can make it use a lot more bandwidth during the entire time, but CPU usage goes up and it takes all 8 cores to process 100MB/sec. So the final tuning will be a careful balancing of the CPU allocated for sig validation and the CPU for processing the bandwidth.
I think I can dynamically change the CPU to the two by estimating the time to completion for each and have a feedback mechanism to get them to finish as close to each other at the same time.