vinmetascript encoding: 03494921

jl777

Active Member
Feb 26, 2016
279
345
To give an idea of how efficiently things can be encoded, the original vinscript is something like:

473044022034004e9584a9842daeb8ab80592ad1096edd88b9421b9b6bacab257e613f25d502201f5bd2dcdc4bea440921b3bb3f0f4929d50654b218d70cf4685d7bb7b02246560121028cf173bf303093eb41d6ae3d94aafba2bf26a67065c589e119784a30cdba0a81

107 bytes, 73 byte sig, 33 byte pubkey

The vin itself is encoded as follows:

struct iguana_spend
{
uint32_t spendtxidind,scriptoffset:31,coinbase:1;
int16_t prevout;
uint16_t numsigs:4,numpubkeys:4,p2sh:1,sighash:4,external:1,sequenceid:2;
} __attribute__((packed));

the spendtxidind is the canonical ordering of the txid being spent, so instead of a 256 bit txid, it maps to 32bits. there is another 32bits for the scriptoffset for the location of the vinscript in the variable store and a bunch of descriptors for how to interpret the meta vinscript.

The metavinscript takes advantage of the fact that the bulky high entropy sigs are only needed during validation and for full relaying nodes that need to reconstruct all the raw block data. So, there is only need to store the minimum info to reproduce the vinscript:

03494921

what? how is that even possible you ask. The reason the 107 bytes can be encoded into 4 bytes is that I take advantage of addresses being reused and scripts being purgable. The scripts are stored in a growdown stack and the pubkeys are stored in the variable store.

03 -> 3 bytes total, this is needed to encode the length of any extra vinscript bytes used for nonstandard redeems, like if/else values of 00 or 01 pushed to the stack at the end. It also acts as an error check to make sure there are no size errors.

49 -> size of sigs
49 -> stack offset (negative) where the sigs start
21 -> heap offset (positive) where the pubkeys start

from the vin struct we know how many sigs and how many pubkeys. So for a reused address and standard 107 byte spend, it is encoded into 4 bytes for all but the first time. The first one does take the 4 bytes extra, but that is recouped overall if just 4% of addresses are reused just once.

so in the standard reused address case, the space savings is ~160 bytes -> 12+6 or 90% reduction in size (after the sigs are purged). With the sigs unpurged, it is only about a 50% reduction.

However, both the vin struct and meta vinscript is compressible and I am seeing about a 50% compression rate, so this means around 20x compression for the reused address case.

If this sort of encoding is used in the raw block itself, then we could get an effective 1.5MB capacity without changing anything about the blocksize assuming about half the tx are reused addresses. I dont have stats on current composition of reused addresses, so that is just a guesstimate

James

P.S. I use varints for the script offset encodings, so it is actually going to take more than 1 byte per script and pubkey offset. but even in the worst case of 5 bytes for each, we end up with 11 bytes per standard metavinscript. still quite a good amount of savings
 
  • Like
Reactions: ntto and Bloomie

jl777

Active Member
Feb 26, 2016
279
345
the above describes the details.

the vin for each bitcoin tx has a signature(s) pubkey(s) [p2sh script] [extra stuff]

unless it doesnt as for custom outputs that dont require signatures, or pubkeys. there are the counts relevant to the vinscript in the vin entry so it doesnt take any extra space. Most all things use a fixed allocation of memory and often, their position indicates their index. However, scripts are definitely not fixed size, but full of redundancies.

iguana identifies any previously seen pubkey or p2sh script and converts it to an offset in the heap. the metascript then uses these offset to encode the raw vin. The signature wont ever duplicate (we hope!), so they are always just put onto a grow down stack. At the end of the bundle processing, the gap between the end of the heap and the top of stack is removed by shifting the sig data down. Since the offsets for the sigs are negative offsets relative to the end of stack, all the values stay valid even after it is moved.

Further, it is intentionally put at the end of a bundle to allow purging all the sigs by truncating the file. Once you do that, you cant be a relaying node anymore, but it will save about half the space used.
 
Last edited: