So I'm reading this book for my day job - Cassandra - The Definitive Guide - and I came across a section which I thought was particularly relevant to the Bitcoin network and in particular the scaling concern.
The author discusses the usual method for sharding or partitioning a database by dividing records among nodes by some evenly distributed natural key, but then goes on to describe an alternative sharding strategy based on function.
This got me thinking about how nodes on the Bitcoin network could be segmented by function. I thought of three separate functions on which nodes could split:
A couple of issues which immediately spring to mind:
This is basically a big shower-thought for me. Does this idea have any merit at all?
BTW my first post on the forum, hello everybody!
The author discusses the usual method for sharding or partitioning a database by dividing records among nodes by some evenly distributed natural key, but then goes on to describe an alternative sharding strategy based on function.
Feature-based sharding or functional segmentation
This is the approach taken by Randy Shoup, Distinguished Architect as eBay, who in 2006 helped bring the site's architecture into maturity to support many billions of queries per day. Using this strategy, the data is split not by dividing records in a single table, but rather by splitting into separate databases for features that don't overlap with each other very much. For example, at eBay, the users are in one shard, and the items for sale are in another. At Flixster, movie rating are in one shard and comments are in another. This approach depends on understanding your domain so that you can segment data cleanly.
This got me thinking about how nodes on the Bitcoin network could be segmented by function. I thought of three separate functions on which nodes could split:
- Validating nodes - These nodes verify the correctness and contribute to the security of the blockchain. They wouldn't necessarily store the entire blockchain since the genesis block, similar to how pruned nodes work today. These nodes would require fast CPU's to run the validation, and at least enough fast storage to store the UTXO set. You could see businesses who send & receive direct payments in Bitcoin running these nodes. You might also call them "transacting nodes".
- Archiving nodes - These nodes do not verify the correctness of the blockchain, but simply store it for posterity. They obviously would require huge amounts of storage, but not necessarily fast processors or bandwidth. These are the nodes you could envision becoming large data centers, with warehouses full of rolls of tape containing historical transactions. Business which offer Blockchain analytics or forensics services might choose to run an archiving node.
- Relaying nodes - These nodes neither verify nor store the blockchain, but simply exist to quickly propagate blocks and transactions around the network. As such they don't require fast CPU's or huge amounts of storage, only a fast connection to the internet and plenty of bandwidth.
A couple of issues which immediately spring to mind:
- Wouldn't archiving nodes have to "trust" the validating nodes? Trust for what? The purpose of the archiving node is simply to store the data it sees on the network and make it available in the future, it passes no judgement on the validity of the data it sees. If you are a business which needs to verify the correctness of the blockchain, you should be running a validating node, not an archiving one.
- What's the incentive to run a relaying node? I don't have a good answer for this.
- A validating node has to walk the entire blockchain anyway? There's some interesting discussions in progress about how a node could start validating without having to download each and every historical transaction since the genesis block.
- Does this increase centralization? I'm not sure, I think less so than if every node fulfilled all 3 functions. In this case, you can choose to run a node with the resources you have available, rather then being limited by the resources you don't have. For example if you have a fast internet connection but only a Rasberry PI connected to it, you could run a relaying node and still contribute to the network. You're not limited by your Pi's slow CPU or lack of storage. As long as you have a least one of [fast CPU; lots of storage space; fast connection], you can contribute to the Bitcoin network.
- What's the point having nodes on the network that don't validate? Good question, I think the guys who are advocating SegWit soft fork don't seem to see a problem with it...
This is basically a big shower-thought for me. Does this idea have any merit at all?
BTW my first post on the forum, hello everybody!
Last edited: