To give an idea of how efficiently things can be encoded, the original vinscript is something like:
473044022034004e9584a9842daeb8ab80592ad1096edd88b9421b9b6bacab257e613f25d502201f5bd2dcdc4bea440921b3bb3f0f4929d50654b218d70cf4685d7bb7b02246560121028cf173bf303093eb41d6ae3d94aafba2bf26a67065c589e119784a30cdba0a81
107 bytes, 73 byte sig, 33 byte pubkey
The vin itself is encoded as follows:
struct iguana_spend
{
uint32_t spendtxidind,scriptoffset:31,coinbase:1;
int16_t prevout;
uint16_t numsigs:4,numpubkeys:4,p2sh:1,sighash:4,external:1,sequenceid:2;
} __attribute__((packed));
the spendtxidind is the canonical ordering of the txid being spent, so instead of a 256 bit txid, it maps to 32bits. there is another 32bits for the scriptoffset for the location of the vinscript in the variable store and a bunch of descriptors for how to interpret the meta vinscript.
The metavinscript takes advantage of the fact that the bulky high entropy sigs are only needed during validation and for full relaying nodes that need to reconstruct all the raw block data. So, there is only need to store the minimum info to reproduce the vinscript:
03494921
what? how is that even possible you ask. The reason the 107 bytes can be encoded into 4 bytes is that I take advantage of addresses being reused and scripts being purgable. The scripts are stored in a growdown stack and the pubkeys are stored in the variable store.
03 -> 3 bytes total, this is needed to encode the length of any extra vinscript bytes used for nonstandard redeems, like if/else values of 00 or 01 pushed to the stack at the end. It also acts as an error check to make sure there are no size errors.
49 -> size of sigs
49 -> stack offset (negative) where the sigs start
21 -> heap offset (positive) where the pubkeys start
from the vin struct we know how many sigs and how many pubkeys. So for a reused address and standard 107 byte spend, it is encoded into 4 bytes for all but the first time. The first one does take the 4 bytes extra, but that is recouped overall if just 4% of addresses are reused just once.
so in the standard reused address case, the space savings is ~160 bytes -> 12+6 or 90% reduction in size (after the sigs are purged). With the sigs unpurged, it is only about a 50% reduction.
However, both the vin struct and meta vinscript is compressible and I am seeing about a 50% compression rate, so this means around 20x compression for the reused address case.
If this sort of encoding is used in the raw block itself, then we could get an effective 1.5MB capacity without changing anything about the blocksize assuming about half the tx are reused addresses. I dont have stats on current composition of reused addresses, so that is just a guesstimate
James
P.S. I use varints for the script offset encodings, so it is actually going to take more than 1 byte per script and pubkey offset. but even in the worst case of 5 bytes for each, we end up with 11 bytes per standard metavinscript. still quite a good amount of savings
473044022034004e9584a9842daeb8ab80592ad1096edd88b9421b9b6bacab257e613f25d502201f5bd2dcdc4bea440921b3bb3f0f4929d50654b218d70cf4685d7bb7b02246560121028cf173bf303093eb41d6ae3d94aafba2bf26a67065c589e119784a30cdba0a81
107 bytes, 73 byte sig, 33 byte pubkey
The vin itself is encoded as follows:
struct iguana_spend
{
uint32_t spendtxidind,scriptoffset:31,coinbase:1;
int16_t prevout;
uint16_t numsigs:4,numpubkeys:4,p2sh:1,sighash:4,external:1,sequenceid:2;
} __attribute__((packed));
the spendtxidind is the canonical ordering of the txid being spent, so instead of a 256 bit txid, it maps to 32bits. there is another 32bits for the scriptoffset for the location of the vinscript in the variable store and a bunch of descriptors for how to interpret the meta vinscript.
The metavinscript takes advantage of the fact that the bulky high entropy sigs are only needed during validation and for full relaying nodes that need to reconstruct all the raw block data. So, there is only need to store the minimum info to reproduce the vinscript:
03494921
what? how is that even possible you ask. The reason the 107 bytes can be encoded into 4 bytes is that I take advantage of addresses being reused and scripts being purgable. The scripts are stored in a growdown stack and the pubkeys are stored in the variable store.
03 -> 3 bytes total, this is needed to encode the length of any extra vinscript bytes used for nonstandard redeems, like if/else values of 00 or 01 pushed to the stack at the end. It also acts as an error check to make sure there are no size errors.
49 -> size of sigs
49 -> stack offset (negative) where the sigs start
21 -> heap offset (positive) where the pubkeys start
from the vin struct we know how many sigs and how many pubkeys. So for a reused address and standard 107 byte spend, it is encoded into 4 bytes for all but the first time. The first one does take the 4 bytes extra, but that is recouped overall if just 4% of addresses are reused just once.
so in the standard reused address case, the space savings is ~160 bytes -> 12+6 or 90% reduction in size (after the sigs are purged). With the sigs unpurged, it is only about a 50% reduction.
However, both the vin struct and meta vinscript is compressible and I am seeing about a 50% compression rate, so this means around 20x compression for the reused address case.
If this sort of encoding is used in the raw block itself, then we could get an effective 1.5MB capacity without changing anything about the blocksize assuming about half the tx are reused addresses. I dont have stats on current composition of reused addresses, so that is just a guesstimate
James
P.S. I use varints for the script offset encodings, so it is actually going to take more than 1 byte per script and pubkey offset. but even in the worst case of 5 bytes for each, we end up with 11 bytes per standard metavinscript. still quite a good amount of savings