Gold collapsing. Bitcoin UP.

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,995
for newbies, here's the more hard core SW math post i made a while back:

https://bitco.in/forum/threads/gold-collapsing-bitcoin-up.16/page-308#post-11291
[doublepost=1458148261,1458147430][/doublepost]
I think it is useful to think of segregated witness as consisting of at least three parts that can be considered separately:

1) Separation of signatures. SegWit is fundamentally just a reorganization of how transaction data is stored in blocks. It stores transactions in two parts, separated into two merkle trees. This makes pruning of signatures easier. It also helps with maleability, since the non-signature part of the transaction is not malleable (I guess you could still malleate the signature portion, which would only affect the witness merkle tree).

2) Scaling/witness discount/accounting tricks. This characteristic arises because of the block size limit combined with how they chose to implement Segwit as a soft fork. Since the witness portion of the merkle tree is not seen by old nodes, it is not constrained by the block size limit. They could have left the witness portion unbounded, made it fit within the 1MB limit, or added a static limit (eg. 4MB). Instead they chose to apply a formula with a 1/4 discount counting witness data towards the 1MB block size limit.

3) Changes to “witness” portion of transactions. This is where they include things like script versioning, non-quadradic scaling of sighashes, input signing, and various other changes. Again, none of these changes are intrinsic to SegWit, but the way it is introduced as a soft fork gives them an opportunity to include this set of changes they want. Some of these changes are nice, but they could just as easily be introduced separately in bite-sized portions, rather than shoved down everyone’s throats in one large clump.
the big picture way to look at this is that Blockstream core dev is attempting to unshackle their ability to change Bitcoin. they don't think it works. or more likely, they are surprised it does work and carry the regret of a missed opportunity to the point where they now are willing to even criticize Satoshi to try and get what they want. which ignores the tremendous success which has got us to where it is today as a SOV but by limiting it's full potential as a p2p cash. they don't like that it is meant to be primarily a hands off technology that is hard to change and they've obfuscated this discussion with fear about HF vs SF. like i said, this is not about HF vs SF anymore; it's about the meat and potatoes of what in toto both parties are trying to change. and SW is trying to change the very fundamentals of Bitcoin consensus rules in a huge way so that core devs can make "all sorts of changes" that just happen to facilitate their fiat business opportunities and satisfy their investors.
 
Last edited:

Justus Ranvier

Active Member
Aug 28, 2015
875
3,746
@Mengerian I think I know the reason they aren't doing the logical sensible thing and breaking this out into a series of clean, bite-sized, portions.

I met Adam Back at a Bitcoin conference, which I believe was the first Las Vegas conference. He was convinced that gridlock and contention was causing Bitcoin to ossify and to there was just one chance to make necessary major changes.

That sounded like a self-fulfilling prophesy to me so I asked why not instead have a long term roadmap with planned periodic hard forks where beneficial changes could be argued for well in advance and upgrades would be smooth since they would be scheduled.

Obviously he didn't take that path, and his belief that he'd only get one shot at this caused him to take underhanded, dishonest, and indefensible actions to get that one shot.

Of course it's the underhanded, dishonest, and indefensible actions which caused the situation he was worried about in the first place, but the people caught in the psychosis of a self-fulfilling prophesy can never see that.
 

jl777

Active Member
Feb 26, 2016
279
345
utxo space is not a big issue. iguana encodes it into 4 bytes each. how is this any bottleneck?

using a totally backward compatible method, I already separate the vin data (mostly sigs) into a separate directory. For nodes that dont need to relay, this can be purged after validated. It is not optimized and I can still extract out the pubkeys and p2sh scripts that are redundant inside the vindata, but it is ~35GB.

The rest of the data is mostly in the read-only parallel bundles, which has enough data to allow for blockexplorer level queries. Not just blockchain, so you can query about address balances directly. This dataset is also around 35GB, but unlike normal blockchain data, it compresses well, to half the size. so there would be 17GB in a compressed readonly filesystem. It does not have to be revalidated over and over, just need to make sure the read only filesystem hasnt been tampered with. and it has the search indexes built in, which means "instantly on" after each restart

My first totally unoptimized version of the dynamic data came in at 1.3GB, but that is mostly due to the account balances being updated for all addresses. The actual spend vectors are 350MB, but they are in the readonly section, so is much cheaper to store

I will be increasing the size a bit more so I can get the address balance as of any block. That seems well worth the increased space.

So it is possible to encode things to less than 20GB other than the vindata, and that is at the level of insight explorer queries, not bitcoind RPC

Also, the main bittorrent network can be used for archival storage of all the readonly files (including the vindata), so we get a fully decentralized storage of the ever growing blockchain, which allows any node at any time to fully validate everything.

But it does take almost an hour now to fully sync from scratch. I think if my server had SSD, it would be able to do it in 30 minutes
 

cypherdoc

Well-Known Member
Aug 26, 2015
5,257
12,995
@jl777

Its great to have someone technical like yourself in the thread. I'm sure we'll all learn alot from you going forward. And hopefully we can return the favor with our experience in game theory and economics.

You should read the links ive provided to the writings of ajtowns esp in regards to bytes required for SW. I've learned a lot about SW from him.
 

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
utxo space is not a big issue. iguana encodes it into 4 bytes each. how is this any bottleneck?
I don't understand how this would be possible. For each unspent output, a node needs to store (a) its value, and (b) its scriptPubKey (the rules which allow the output to be spent).

In a raw Bitcoin transaction, the output's value is stored using 8 bytes (i.e., a 64-bit unsigned int), while the scriptPubKey (for P2PkH) is 25 bytes (5 bytes that identify the script as "pay-to-pubkey hash" and 20 bytes specifying the pubkey hash itself).

How can 33 bytes be reduced to only 4 bytes?
 

jl777

Active Member
Feb 26, 2016
279
345
my point of view is as an implementor, so I need to understand it well enough to code it.

It is possible I am totally confused about SW, but at this point it seems that it costs 2 bytes per tx and 1 byte per vin more in HDD space. Compared to a 2MB hardfork, segwit has less tx capacity. this is now admitted by the bitcoin devs.

they now say they never claimed it would save space....
[doublepost=1458158577][/doublepost]
I don't understand how this would be possible. For each unspent output, a node needs to store (a) its value, and (b) its scriptPubKey (the rules which allow the output to be spent).

In a raw Bitcoin transaction, the output's value is stored using 8 bytes (i.e., a 64-bit unsigned int), while the scriptPubKey (for P2PkH) is 25 bytes (5 bytes that identify the script as "pay-to-pubkey hash" and 20 bytes specifying the pubkey hash itself).

How can 33 bytes be reduced to only 4 bytes?
I have explained in the various iguana threads, but it used to be 6 bytes. When I coded it up I realized I can have each bundle maintain a vector, so the 2 bytes needed for bundle identifier is gone.

The output script is in the readonly space, which is invariant and also compressible by 50%. There is a lot of redundancy with the output scripts and the vast majority can be encoded very efficiently in 4 bits

each unspent has the pkind, which is the pubkeyhash index, so by dereferencing all that you can reconstruct the spend script.

The utxo dataset is the dynamic changing data that needs to be in RAM, so by strength reduction of the space consuming hashes and scripts to the readonly bundles, the space needed in RAM is reduced to 4 bytes. I still need to decide on what tradeoffs to make about efficiently searching through this data. It is actually possible to reduce the space to less than 1 bit per vout, but over time that will grow. Still using RLL might reduce the size of it enough to be the most efficient representation of the current utxo set
 
  • Like
Reactions: majamalu and YarkoL

Dusty

Active Member
Mar 14, 2016
362
1,172
(twitter quote)
I prefer this example:
[doublepost=1458159833,1458159190][/doublepost]
I have explained in the various iguana threads, but it used to be 6 bytes. When I coded it up I realized I can have each bundle maintain a vector, so the 2 bytes needed for bundle identifier is gone.

The output script is in the readonly space, which is invariant and also compressible by 50%. There is a lot of redundancy with the output scripts and the vast majority can be encoded very efficiently in 4 bits

each unspent has the pkind, which is the pubkeyhash index, so by dereferencing all that you can reconstruct the spend script.

The utxo dataset is the dynamic changing data that needs to be in RAM, so by strength reduction of the space consuming hashes and scripts to the readonly bundles, the space needed in RAM is reduced to 4 bytes. I still need to decide on what tradeoffs to make about efficiently searching through this data. It is actually possible to reduce the space to less than 1 bit per vout, but over time that will grow. Still using RLL might reduce the size of it enough to be the most efficient representation of the current utxo set
I'm not sure I understand all the details, and I'm interested in them (I'm an implementor too), care to make a complete example?
Or maybe point me to a thread where you already explained those details?
Thanks!
 
  • Like
Reactions: AdrianX and bitsko

Peter R

Well-Known Member
Aug 28, 2015
1,398
5,595
I have explained in the various iguana threads, but it used to be 6 bytes. When I coded it up I realized I can have each bundle maintain a vector, so the 2 bytes needed for bundle identifier is gone.

The output script is in the readonly space, which is invariant and also compressible by 50%. There is a lot of redundancy with the output scripts and the vast majority can be encoded very efficiently in 4 bits

each unspent has the pkind, which is the pubkeyhash index, so by dereferencing all that you can reconstruct the spend script.

The utxo dataset is the dynamic changing data that needs to be in RAM, so by strength reduction of the space consuming hashes and scripts to the readonly bundles, the space needed in RAM is reduced to 4 bytes. I still need to decide on what tradeoffs to make about efficiently searching through this data. It is actually possible to reduce the space to less than 1 bit per vout, but over time that will grow. Still using RLL might reduce the size of it enough to be the most efficient representation of the current utxo set
Thanks for the explanation. I think I get what you're saying. In my own words, you're working on a new database structure that is optimized for storing the UTXO. You're still storing the pubKeyHash as a full 20 bytes, but you're placing it in "colder storage" memory and have only a reference to that memory in the "hot storage" you've reserved for the UTXO. When you say "4 bytes per output" you are not referring to all of the information you store, but instead only the information required to quickly "reference" all of the information about an unspent output.

Is that a reasonable description of what you're doing?
 

Richy_T

Well-Known Member
Dec 27, 2015
1,085
2,741
Last edited:
  • Like
Reactions: AdrianX

jl777

Active Member
Feb 26, 2016
279
345
I analyzed performance of bitcoin and determined its reliance on DB for all blockchain RPC was a big bottleneck.

So I use memory mapped files wherever possible. this allows the system to even use the L2 cache for the most used data. It is also designed to be able to be parallel searched and maps to GPU architecture.

With that as background, there are a few things to consider. invariant data (data that never changes) costs much much less to have around than data that is actively changing. Not only does readonly data get created faster by using append only methods, once create it can be fully decentralzed via the main bittorrent network. Unlike bootstrap files which change all the time, the readonly data never changes, so its torrent hash never changes.

that means anything in the readonly space will eventually not even cost any relay node bandwidth as the torrent network will handle that part.

Now we come to the question of how much of the blockchain can be made into permanent never changing readonly data. The answer to this is a bit surprising. It turns out that almost all data can be transformed into this form. I even put the vin linkages in the readonly dataset, as once all the prior bundles are there, it is a onetime lookup to find any external (to the bundle) txids that are spent

My first pass for this has it reduced to:

struct iguana_account { int64_t total; uint32_t lastind; };

and a single 32 bit index, which includes a one bit spentflag, for each vout

the 32bit index is "bloated" so I can encode a linked list to rapidly find all spends by an account in a bundle. 31 bits are used for that, but I plan to add another 32bits so I can encode the height that it was spent. this will allow for rapid caluculation of balances as of any height and I think a nice luxury to have for several use cases.

The spentflag is a onetime write, so it is not exactly readonly but not as demanding as fully r/w, and it can be optimized more if needed.

the account structure of 12 bytes allows keeping current account balances updated for all addresses. This does need to be in RAM or at least r/w enabled, but this level of data is well beyond the normal bitcoind RPC and more in the realm of insight blockexplorer queries

tl:dr using utxo scalability as any excuse to force segwit softfork is pretending more efficient implementations are not possible
[doublepost=1458161553][/doublepost]
I prefer this example:
[doublepost=1458159833,1458159190][/doublepost]
I'm not sure I understand all the details, and I'm interested in them (I'm an implementor too), care to make a complete example?
Or maybe point me to a thread where you already explained those details?
Thanks!
iguana child board in the bitcoin protocol section
 

Richy_T

Well-Known Member
Dec 27, 2015
1,085
2,741
But it does take almost an hour now to fully sync from scratch. I think if my server had SSD, it would be able to do it in 30 minutes
Get an SSD then :)

If you don't mind consumer grade, 250 G is under $100. I'd front you that myself for all the work you've done.
 
  • Like
Reactions: Norway and AdrianX

jl777

Active Member
Feb 26, 2016
279
345
it is just the configuration of the VPS I used for testing. I also test on a 4GB RAM, 1.3Ghz i5 laptop and that has SSD.

1 hour is fast enough, especially as I think I can get it closer to 40 minutes. If I had really fast hardware, then it makes it not so painful when the software is slow. this is a form of self-torture that spurs me to make things faster
 

AdrianX

Well-Known Member
Aug 28, 2015
2,097
5,797
bitco.in
That's not needed: we could prune the signatures from current txs, if we want, it's just an implementation detail (of how to save tx on disk).

Also, to give a discount to who uses segwit is dishonest: this gives an advantage to those users while using the same resources on the net.
This is done to push the use of LN by discounting its transactions.
[doublepost=1458136335][/doublepost]
That's not correct: if you implement segwit with a hard fork (as it should be), you don't need the trick of using a anyone-can-pay tx and you can measure the block size correctly, so you have blocks of max allowed size as usual.
thanks @Dusty I also think SW should be a Hard Fork proposal. But as it's as a Soft Fork proposal does that invalidate my statement.
 

jl777

Active Member
Feb 26, 2016
279
345

AdrianX

Well-Known Member
Aug 28, 2015
2,097
5,797
bitco.in
@Peter R I'm trying to think of ways to visualize bitcoin growth. Particularly Blockchain growth and
Unspent Transaction Output, UTXO growth.

Bitcoin's "Metcalfe's Law" relationship between market cap and the square of the number of transactions shows a reasonable correlation between the number of transactions and price.

If one was to make a simple projection from the data mentioned above using price, say: $1,000 - $10,000 - $100,000 - $1,000,000 could one make an assumption of the number of transactions and create an estimate of both the block size and UTXO size from that that data?

if the answer resembles a yes, could you provide an estimate :):):)
 
Last edited: