iguana - parallel sync full BTC blockchain in 30 minutes, uses half the space, but starts up instant

jl777 · Feb 26, 2016

I posted some details on what iguana does in BCT https://bitcointalk.org/index.php?topic=1377459.0

But apparently it is not much of interest there.

I have mostly working code that does a parallel sync of blockchain creating read-only bundle files (of 2000 blocks) and is able to stream at 500mbps to 1gbps speeds without too many slow spots.

The net result is a compressible set of files via mksquashfs so you end up with about 15GB (without sigs) data set in a read-only volume for all but he most recent blocks. These files are invariant, so once validated never need to be verified again, meaning "instant-on" after each restart.

The speed of sync is limited by bandwidth, so a typical home user's 20mbps connection takes about 6hours, which is about 6x faster than 0.12, but on a fast connection I am seeing sustained speeds of 70MB to 120MB/sec, yes 1GB per 8 seconds during peak rates. And it is processing all this in close to realtime if you have 8 cores. The compression done after the permanent files are created takes about half an hour.

I am still debugging, it is a lot of code that I wrote since last november. I am adding a bitcoin RPC compatible layer, so it will be able to be a drop in replacement. The codebase is all in C that I wrote and the only external dependencies are openssl, but that can be reduced to gmp by using the libsecp library. There is a lot more than just the bitcoin core from my other work over the years, and the entire codebase compiles to around 3MB and is portable to many OS, even chrome as a JS bytecode pexe in a chrome app.

It there any interest in something like this here?

James

theZerg · Feb 27, 2016

yes. Can you explain your approach in greater detail?

jl777 · Feb 28, 2016

yes, but not in an open ended way. the code is in my jl777/SuperNET repo
docs.supernet.org has API bindings

if you have a specific question I can answer it, but generally speaking my approach was to write a bitcoin core without any external dependencies that can sync data at whatever bandwidth you have available on a sustained basis. I also hate waiting forever for importing privkeys and the total lack of multisig support at the address level, so it is doing this at an effectively txindex=1 level, with additional data structures to allow for address based queries and "instant-on" where it is ready to go without rescanning previously created data. Since they are read only files, all that is needed is to make sure they havent changed. Yes, that means no DB, but a fixed set of data in files that can be used directly as memory mapped files.

Solving all the performance requirements forces quite a few constraints, and that is the general approach.

Make it fast, use less space, run faster and be portable. Since this is half the solution to a scalable bitcoin, it is quite surprising the almost total lack of interest in it.

James

theZerg · Feb 28, 2016

Right now I don't have the time to dig thru your code. I think that you should try to answer my question in an open ended way -- to answer the 5 "whys" in a basic way either to me here or even better in your README.md file. For example:

what is it? (separate bitcoin implementation, replacement of a portion, or entity that "injects" stuff using bitcoinRPC)
why is it better?
How does it work?

what would you like to see us do with your code?

You might have done something awesome here but the reality is that you can't expect other people to invest lots of time to find that out. You need to bring something to them, either a 5 minute read that explains how and why your work solves their problems or a demo.

Also, GPLv2 is a deal-breaker if you mean it to replace a subset of any Satoshi client because it is not compatible with the less restrictive license the Satoshi client uses.

jl777 · Feb 28, 2016

it is a bitcoin implementation written all in C. It is designed for high throughput and parallel syncs the blockchain. Of course a final pass is needed at the end to verify everything, but at that point most all of the work has been done.

It is mostly able to sustain parallel sync at bandwidth saturation of the network connection. This means on 20mbps connection about 6 hrs and on 1gbps connection about half hour for a full sync.

The files it creates are readonly files and never change after they are created. this allows them to be further compressed into a readonly filesystem. Without signatures, the dataset ends up at around 15GB, 25GB when not compressed.

I also added a layer of data to the readonly files that can be used for an in memory hash table directly as memory mapped files. This means the startup time is close to 3 seconds and it is ready to start syncing new blocks,

I am currently adding a bitcoin RPC layer so that it will be compatible with existing bitcoin apps.

There is also a GUI team that is creating an all HTML/JS GUI that we are working on that will interface via bitcoin RPC.

The code can run as a chrome app or as native mode. As a chrome app, users will be able to run a bitcoin node from the browser, though I would have to add a pruning mode as google limits storage to 10GB.

The codesize is compiled to 3MB executable.

So, it is smaller, faster and runs in the browser. I have also set things up so it can be further improved to allow for incremental ledgers, which will also help to achieve scalable bitcoin.

In addition to the bitcoin code, there is also an atomic swap DEX, decentralized poker and beginnings of private chains, but they are only all in the same executable due to the 300kb overhead per chrome app and can be separated out.

I have written all this code from scratch and the only external dependencies are openssl, which should be changed to libsecp256k. So license issues are not an issue as I am copyright holder. I am coding the last part that is needed now (bitcoin RPC compatibility layer) and when that is in place it can be put through the blocktester and other standard bitcoin validation tests.

Bitcoin core guys dont seem to be much interested in anything like this, but I was told that BU was more open for new tech. Anyway, I cannot guess the issues that are important to BU, so hopefully if there is interest, just ask any questions and I will do my best to answer.

James

jl777 · Mar 2, 2016

There doesnt seem to be much interest here either, but I wanted to post some more details in hopes that any flaws will be identified.

The goal for iguana is to create a scalable bitcoin core implementation that is backward compatible and a drop in replacement, so all the RPC needs to be implemented in addition to peer messaging, blockchain, scripts, wallet, etc.

The first thing you notice when looking at the raw blockchain is that there is a LOT of redundancy, so by mapping the high entropy hashes to a 32bit integer, you get a 28 byte savings for each use. For a txid with N outputs, that is up to N*28 bytes that would be saved as each vin refers to the txid.

Since the blockchain has an implicit ordering, it is possible to create a canonical numbering for the txid, vouts, vins, and this will allow syncing the iguana files to save gobs of bandwidth. However both endian formats need to be put into the bittorrent network as the iguana files are designed to be directly memory mapped. This allows skipping of serialization/deserialization of each and every multibyte field. Since the files are less than half the size, even with the doubling due to both endian forms, it is still less overall data than the raw blockchain.

bitfields are used, so that means a standard way of allocating the bits needs to be used by each compiler. gcc and clang use the same method, so as long as it is compiled with those or one with compatible bitfield allocation, the memory mapped files should work.

The most space is used by the vout, vin, pkhash structures, as there are a lot more vouts than txids. The txid structure has not been fully optimized, but since it is less than 1% of overall space, I felt it was not worth making it more complicated to save few bytes. The pkhash is the pubkey hash rmd160(sha256(pubkey)) and this is the most efficient way to index the blockchain as all vout scripts generate a rmd160[20] either implicitly for the scripts with pubkey, pay to pubkey hash standard script and all the new p2sh scripts. Another reason to use rmd160 is that it makes it easy to convert to the equivalent addresses for other coins, ie. all coins have the same rmd160 even if the coin addresses are different due to the address type byte added to the base58 conversion.

The following are the second pass data structures that are created from a batch of raw structures in groups of 2000 (the size of the getheaders). The design needs to meet many constraints, primarily to be as small as possible without sacrificing speed and to be able to work in a parallel sync. That means that each block or bundle needs to be as self-contained as possible, but have external references that can be resolved. This is similar to object files and linkers. The main thing that has these external references are the vins.
[doublepost=1456972265][/doublepost]I tried quite a few variations before settling on this. Earlier versions combined everything into a single dataset, which is good for making searches via hashtable really fast, but with the ever growing size of the blockchain not very scalable. The maximum size of 2000 blocks is 2GB right now and at that size there is no danger of overflowing any 32bit offset, but for the most part, the 32bit indexes are of the item, so it can represent much larger than 4GB.

iguana doesnt use any DB as that is what causes most of the bottlenecks and since the data doesnt change (after 20 blocks), a DB is just overkill. Using the memory mapped file approach, it takes no time to initialize the data structures, but certain operations take linear time relative to the number of bundles. Achieving this performance requires constant time performance for all operations within a bundle. Since most bundles will not have the hash that is being searched for, I used a bloom filter to quickly determine which bundles need to be searched deeper. For the deeper searches, there is a open hashtable that always has good performance as it is sized so it is one third empty. Since the total number of items is known and never changes, both the bloom filters and hashtable never change after initial creation.

What this means is that on initialization, you memory map the 200 bundles and in the time it takes to do that (less than 1sec), you are ready to query the dataset. Operations like adding a privkey takes a few milliseconds, since all the addresses are already indexed, but caching all the transactions for an address is probably not even necessary for a single user wallet use case. However for dealing with thousands of addresses, it would make sense to cache the lists of transactions to save the few milliseconds per address.

You might be wondering how is it even possible to have an append only dataset that allows traversing the entire blockchain and searching for all transactions from a specific address. Note that by indexing at the rmd160 level, there is no difference between a multisig address and a normal address and so all the operations work equally for any address, be it pubkey, pubkeyhash, multisig or p2sh.

With 200 bundles and millions of unspents per bundle recently, it would be easy to have things take a long time to iterate through and find all references to a specific address. What I realized was that during a single pass I can update an arbitrary number of linked lists, one for each rmd160. However it needs to work backwards so you never need to change any previous entry. As soon as an unspent is created, it is final and never changes. It does require a single dynamic data structure for the account which keeps track of the balance (just the sum of outputs) and the last unspentind. As new unspents are made to the same address, it links back to the prior one and updates the total balance.

To get the list of all transactions, all bundles need to be queried. For each bundle, constant time hash lookups find the last access and then iterating backwards to the first occurance finds all unspents in the bundle. So it is linear time relative to the total number of unspents that an address has with a small 200 constant time operations overhead. As the number of bundles grows, this will continue to increase, but it is always possible to make a single lookup table that spans the entire set of bundles, so I am not worried about scaling things up. Note that the parallel nature of all the bundles makes using multiple cores for all the searches relatively easy, so speedups of N using N cores is not much work.

I could probably get rid of the firstunspentind in the pkhash struct, but it is there to provide an error check when iterating backwards on the linked list. the pubkeyoffset is also optional, but I find it handy to be able to know what pubkey maps to the rmd160 for a variety of use cases and I am on the fence as to whether to make it purgeable or not.

jl777 · Mar 2, 2016

I had to make the signatures from the vinscripts purgeable as I dont seem much use for them after a node has validated an input other than relaying the raw block to other nodes. Without the sigs, a node can still be totally self-sufficient when creating new transactions and the sigs are high entropy and unique and is approx equal to the uncompressed size of everything else! The pubkeys are much smaller, especially due to address reuse within a bundle, which only takes up 4 bytes. It is probably worth adding a way to purge the pubkeys at some point, but the method I used to enable signature pruning was to create a stack that grows down to put the signatures into during processing the each block. There is also a forward growing data store where all the things that cant be encoded in the baseline structures are put into.

It is necessary to used an upfront memory allocation as doing hundreds of millions of malloc/free is a good way to slow things down, especially when there are many threads. Using the onetime allocation, cleanup is guaranteed to not leave any stragglers as a single free releases all memory. After all the blocks in the bundle are processed, there will be a gap between the end of the forward growing data called Kspace and the reverse growing stack for the sigs, so before saving to disk, the sigs are moved to remove the gap. At this point it becomes clear why it had to be a reverse growing stack. I dont want to have to make another pass through the data after moving the signatures and by using negative offsets relative to the top of the stack, there is no need to change any of the offsets used for the signatures.

Most of the unspents use standard scripts so usually the script offset is zero. However this doesnt take up much room at all as all this data is destined to be put into a compressed filesystem, like squashfs, which cuts the size in about half. Not sure what the compressed size will be with the final iteration, but last time with most of the data it was around 12GB, so I think it will end up around 15GB compressed and 25GB uncompressed.

Each bundle file will have the following order:
[<txid> <vouts> <vins>][nonstandard scripts and other data] ... gap ... [signatures]
after saving it the gap goes away. Since the signatures are at the end, it is possible to truncate each bundle file to dramatically reduce its size. It would save a lot of time compressing the files without the signatures as they are high entropy and dont compress, but having them in a different file would really complicate the code. Not that it isnt already quite complicated.

I realize totally replace all the DB is rather radical, but it was the only way possible to achieve a parallel sync that streams data in at bandwidth saturation speeds. In order to validate the dataset, which is clearly required before any production use of a DB replacement, I decided to put in the extra effort to make iguana able to act as a lossless codec. This would allow verification at the rawtx bytes level with a simple loop that iterates across all blocks and all tx within a block to verify that all txbytes match against what bitcoind returns.

Of course, since bitcoind wont be able to even calculate balances for all addresses without enabling watching all addresses, I dont know a practical way to verify that all the balance calculations are correct.

Here are the fundamental data structures and the total size for each is not a typo, it really is 64 bytes for txid, 28 bytes per unspent, 12 bytes per spend and 32 bytes per pkhash and 12 bytes for account balance and end of linked list. But things are even smaller! Each unspent has a unique unspentind within each bundle, so you can have a list of all the unspents that takes up 4 bytes per unspent + 4bytes per bundle it appears in.

I havent optimized the utxo handling yet, but the plan is to calculate an overlay vector of all unspents that are spent for each bundle. This too is a static dataset, but it cant be calculated until all prior bundles are present due to all external references needing to be resolved. This final step would happen as the mainchain is validated linearly as the parallel sync is proceeding. For simplicity I will also verify all the signatures during this last pass.

At that point, to create the current utxo at any bundle boundary, it is a matter to OR all the spend vectors. That brings all the data current to the most recent bundle, which might be 1 or 1999 blocks in the past. So the final block will need to be special cased to allow it to be searched before it is in final form.

I think that covers most of the basics.

struct iguana_txid // 64 bytes
{
bits256 txid;
uint32_t txidind,firstvout,firstvin,locktime,version,timestamp,extraoffset;
uint16_t numvouts,numvins;
} __attribute__((packed));

struct iguana_unspent // 28 bytes
{
uint64_t value;
uint32_t txidind,pkind,prevunspentind,scriptoffset;
uint16_t hdrsi:12,type:4,vout;
} __attribute__((packed));

struct iguana_spend // 12 bytes
{
uint32_t spendtxidind,scriptoffset;
int16_t prevout;
uint16_t numsigs:4,numpubkeys:4,p2sh:1,sighash:4,external:1,sequenceid:2;
} __attribute__((packed));

struct iguana_pkhash // 32 bytes
{
uint8_t rmd160[20];
uint32_t pkind,firstunspentind,pubkeyoffset;
} __attribute__((packed));

// dynamic during bundle creation
struct iguana_account // 12 bytes
{
uint64_t balance; uint32_t lastunspentind;
} __attribute__((packed)); // pkind

Emperor Bob · Mar 3, 2016

I'll take a look at the code and try to give an explanation of what the approach is, why it's faster, and what drawbacks it may have, for people who don't have the patience to dig through the code.

I've been thinking about the potential speed gains you could have from a ground up reimplementation, but this seems even faster than I'd expect.

Don't take silence here for disinterest. There's just not that many people here (I only noticed your post after it was re-shared on reddit).

jl777 · Mar 3, 2016

at least there is no gmax making derisive comments here

The code is not documented yet so it is raw C and I like to pack a lot into each line so I can see more per screen. the parallel sync is just one part of the code base, I have it setup to simultaneously communicate to any other bitcoin protocol coin (though I havent had a chance to support anything other than SHA256 for the blockhash yet)

github.com/jl777/SuperNET is the repo and the iguana directory has most of this in the various iguana_*.c files

I worked with a bandwidth monitor to achieve bandwidth saturation, but i the last month I have been working on getting atomic swap protocol integrated in as a virtual exchange so that the standard orderbooks, buy/sell method can be used to do atomic swaps with altcoins.

There is also the beginnings of a private chains framework, but havent had a chance to blockchain it, right now it is just unstructured data flowing into category/subhash virtual adresses.

Oh, there is also a decentralized texas hold'em poker protocol and a fiat pegging system that uses a delta neutral portfolio balancing.

So it adds up to over 50,000 lines of custom C, but it compiles into 3MB of code, even for the chrome app. So all this can run from the browser and since it implements the full protocol, people will be able to run a decentralized node from a browser page.

Just ask any questions that come up and it is not final yet, so plz dont complain about all the messy stuff in some places. I just havent had anybody else to help me with the coding, so it is hard to keep things nice and tidy all during the process.

In recent weeks, it looks like I broke the fastest streaming mode, but I will return to that section hopefully soon to get it back to full speed. I needed to get all the script handling in place to make sure I had all the data needed and currently debugging the lossless codec mode. Havent pushed the latest code in for a while, but what is there should give a good idea.

My conclusion is that nobody ever actually did performance optimization for bitcoin. And I dont see why we cant create a fully scalable system. The sigs will take about half the space, so without them it will be a bit more than 15GB for everything else. And this can be distributed via normal torrents over bittorrent peers as they are never changing files. I still see some dropouts of bandwidth, which would be avoided by using torrents, so it might be possible to get a 5 minute sync, but at that point the CPU will be the bottleneck.

James

Matthew Light · Mar 3, 2016

This is really great stuff.

If you can't get this stuff merged into Core, I hope you will take your talents to a platform where your code will actually be allowed to be run in production.

Dan Parker · Mar 3, 2016

Hey, just wanted to post here because I follow James, jl777, on twitter for his work on NXT. His work in the past has really been great and you should take his work very seriously. I'm a software engineer, but I work too many long hours on a couple of startups to really follow NXT or bitcoin in depth anymore. While I can't tell if what he is proposing is good or not, if you have the time you should really look into this.

jl777 · Mar 3, 2016

All my work is open source, I just put GPL2 to get some control over use. For use in project like BU, I can make an MIT license release for a subset for the pure bitcoin support, but not for the decentralized poker and other value added parts.

Even if no other project uses the iguana, iguana will use the iguana and we already have jenkins autobuilding a chrome app out of the codebase, so a oneclick install across all the OS is basically done. and a full bitcoin implementation that fits into 3MB, runs in browsers, has built in atomic cross chain swaps with a range of altcoins, decentralized poker, etc. SuperNET has several GUI projects that build on top of iguana, for the wallet (stripped down copay wallet) it will just use the bitcoin RPC, so the GUI will work with normal bitcoind.

Well I am not too concerned if bitcoin core feels they want to adopt a 10x faster way to sync or not. end users will follow the path of least resistance and without any external depedencies, the oneclick install chrome app wont have any adoption barrier. I also got most of the files to compile using emscripten so non-chrome browsers can also be supported. I just didnt have time to get curl ported. I only assume stdio, read-only memory mapped files, network sockets, millisecond timer, maybe a few other things, but all the networking, data management, to bitcoin signing, it is all in the iguana codebase. All in C, so I can control exactly all the resources being used.

I just wrote the core since I had to in order to enable the one click install without any dependencies. Also I need to be able to import new privkeys in a bit less than 30 minutes! In order to solve the scalability, efficiency is required. Not sufficient all by itself, but I just dont see an inefficient implementation being able to scale.

By combining iguana with popular monetized games like poker, then we start getting something that has a chance to go mainstream.

James

jl777 · Mar 3, 2016

I think there are 100,000 blocks worth of signatures that need to be verified, which is still a lot of sigs, but with the parallel architecture, all cores can be used for this. since for historical blocks, the odds of invalid sig being in the blocks are low, I think this mass sig validation can happen while the data set is being compressed. For slower connections, there should be plenty of time, but I still need to replace openssl with the secp lib. I read it was 3x to 5x faster, but the sig validation wont affect the read only usage so probably can just do it in the background and warn/prevent the user from doing any money tx until the validation completes

Richy_T · Mar 3, 2016

A lot to digest here but what I am able to take in over a quick read sounds pretty exciting.

jl777 · Mar 3, 2016

I was convinced to write a weekly in-depth post like the one above to cover the various aspects. I think it will take a few months to cover all the ground. I didnt know any better, so I just started coding bottom up only what was actually needed.

Has been quite a learning experience and I started with a couple years on and off from the MGW project, which made a small distributed cluster be an automated BTC <-> NXT gateway. I couldnt believe there was basically no support for multisig other than being able to do it at the lowest level, so I started with creating a practical system.

4 major iterations later, I refactored that into the initial version of iguana. Then kept adding more and more of the bitcoin core functions, even including the RPC, but that is just mostly external stubs now waiting to be connected to the internal functions. docs.supernet.org has the API bindings and you will see the bitcoin RPC is not even half the API calls. I just need a platform to be able to do bitcoin (and altcoin) transactions.

The end goal is the end to end fully integrated end user solutions, which is really all the end users care about. Soon I will be able to make fairly easy to use releases and hopefully I can get some help in getting it full tested. Once iguana passes being pounded on by the bitcoinj blocktester, I think that is about when the wallet will be ready.

It seems this is my new home now, so I will be posting here the things that I dont want scrolling off of the SuperNET slack 10,000 limit, that is where I am in realtime

James

rha · Mar 3, 2016

James, very good work!
I've just read this thread carefully. All optimizations described seem to be secure and part of them looks familiar to me (my recent hobby is reading the blockchain into speed efficient memory structures for analyse).
I want to look at your code and try compiling/running iguana.

jl777 · Mar 3, 2016

still a work in progress, but you can just clone the repo follow the Readme.md to build it, it just requires to compile and link the files, only external dependencies are curl, openssl and not sure if libsecp256k-zkp is referred to in that last push.

Debugging the lossless codec part and getting a bit distracted with all the unexpected reddit publicity. SuperNET/iguana has the main files with SuperNET/crypto777 the lower level library files. Everything is in pure C

just ask any questions as there are no docs yet

Peter R · Mar 3, 2016

@jl777: Thank you for joining bitco.in. We would love to have more people here with your technical skill set.

To get people excited about this work (and it does sound potentially exciting) you need a 20 sec elevator pitch, a couple clear figures, or something that communicates the intuition about your approach quickly, so that people want to dig into the details.

I've read the OP a few times now, and I'm still scratching my head: I can see that you've done some novel things, but I can't quite put my fingers on what they are. Can you state exactly what the problem is that you're trying to solve, and what precisely is novel about your approach?

***

I think one reason there was/is so much excitement for @Peter Tschipper Xthin blocks was that most people could understand what problem they were designed to solve and could visualize--at least at a high-level--how they worked. Can we do the same here with iguana

jl777 · Mar 3, 2016

it does a parallel blockchain sync to a set of readonly files that can stream data at 500megabits per second directly into the final files, which create indexes for fast searching. the parallel dataset allows multicore searches so that for the forseeable future all bitcoin operations will take constant time

The uncompressed dataset for the txindex=1 fits in 30GB uncompressed, about 20GB compressed. The sigs are about 15GB, but only needed during initial validation, so with approx half the space, it allows for blockexplorer level functionality.

Also it is portable C that is able to run as JS bytecodes in the browser. A oneclick installable chrome app that is a full node becomes possible and since it is a multicoin daemon, it can be on multiple coin networks at once and this allows a single program to fully coordinate atomic cross chain swaps

full sync with 0.12 took 36hrs on my laptop/ISP, with iguana it was 6hrs
full sync on a 1gbps 8 core server takes about 15 minutes to get all the data, but the processing doesnt finish until about 30 to 40 minutes

I guess the problem might be that I did a dozen things that all combine together to create the iguana. from removing all runtime mallocs, to using directly memory mappable read only files, and everything else needed to make a full node. however I skipped the mining a block part as that makes no sense for the mass market and also avoids any chance of creating a fork

the iguana is still young, but already it proves that bitcoin can easily scale to max tx capacity without causing any issues for the typical nodes. After initial sync, since the vast majority of the data set is in memory mapped files, it doesnt need a lot of RAM to run. The code size is small enough that it could run on a smart phone and other low powered devices.

does that make bitcoin good for IoT?

YarkoL · Mar 4, 2016

@jl777
Two questions:

We are absolutely interested in getting around the bootstrap problem: with bigger blocks the initial download time of blocks scales quadratically. From what I read here, you have taken steps towards solving that, is this correct?

So the Iguana is an alternative implementation of a Bitcoin client, right? Can we extract the blockchain sync optimized code easily?

iguana - parallel sync full BTC blockchain in 30 minutes, uses half the space, but starts up instant

Active Member

Moderator

Active Member

Moderator

Active Member

Active Member

Active Member

New Member

Active Member

Active Member

New Member

Active Member

Active Member

Well-Known Member

Active Member

New Member

Active Member

Well-Known Member

Active Member

Active Member