Does this idea have any merit? - functional segregation of the network

Matt Davey

New Member
Jul 24, 2016
2
8
United Kingdom
keybase.io
So I'm reading this book for my day job - Cassandra - The Definitive Guide - and I came across a section which I thought was particularly relevant to the Bitcoin network and in particular the scaling concern.

The author discusses the usual method for sharding or partitioning a database by dividing records among nodes by some evenly distributed natural key, but then goes on to describe an alternative sharding strategy based on function.

Feature-based sharding or functional segmentation
This is the approach taken by Randy Shoup, Distinguished Architect as eBay, who in 2006 helped bring the site's architecture into maturity to support many billions of queries per day. Using this strategy, the data is split not by dividing records in a single table, but rather by splitting into separate databases for features that don't overlap with each other very much. For example, at eBay, the users are in one shard, and the items for sale are in another. At Flixster, movie rating are in one shard and comments are in another. This approach depends on understanding your domain so that you can segment data cleanly.

This got me thinking about how nodes on the Bitcoin network could be segmented by function. I thought of three separate functions on which nodes could split:
  • Validating nodes - These nodes verify the correctness and contribute to the security of the blockchain. They wouldn't necessarily store the entire blockchain since the genesis block, similar to how pruned nodes work today. These nodes would require fast CPU's to run the validation, and at least enough fast storage to store the UTXO set. You could see businesses who send & receive direct payments in Bitcoin running these nodes. You might also call them "transacting nodes".
  • Archiving nodes - These nodes do not verify the correctness of the blockchain, but simply store it for posterity. They obviously would require huge amounts of storage, but not necessarily fast processors or bandwidth. These are the nodes you could envision becoming large data centers, with warehouses full of rolls of tape containing historical transactions. Business which offer Blockchain analytics or forensics services might choose to run an archiving node.
  • Relaying nodes - These nodes neither verify nor store the blockchain, but simply exist to quickly propagate blocks and transactions around the network. As such they don't require fast CPU's or huge amounts of storage, only a fast connection to the internet and plenty of bandwidth.

A couple of issues which immediately spring to mind:

  • Wouldn't archiving nodes have to "trust" the validating nodes? Trust for what? The purpose of the archiving node is simply to store the data it sees on the network and make it available in the future, it passes no judgement on the validity of the data it sees. If you are a business which needs to verify the correctness of the blockchain, you should be running a validating node, not an archiving one.
  • What's the incentive to run a relaying node? I don't have a good answer for this.
  • A validating node has to walk the entire blockchain anyway? There's some interesting discussions in progress about how a node could start validating without having to download each and every historical transaction since the genesis block.
  • Does this increase centralization? I'm not sure, I think less so than if every node fulfilled all 3 functions. In this case, you can choose to run a node with the resources you have available, rather then being limited by the resources you don't have. For example if you have a fast internet connection but only a Rasberry PI connected to it, you could run a relaying node and still contribute to the network. You're not limited by your Pi's slow CPU or lack of storage. As long as you have a least one of [fast CPU; lots of storage space; fast connection], you can contribute to the Bitcoin network.
  • What's the point having nodes on the network that don't validate? Good question, I think the guys who are advocating SegWit soft fork don't seem to see a problem with it...


This is basically a big shower-thought for me. Does this idea have any merit at all?

BTW my first post on the forum, hello everybody! :)
 
Last edited:

painlord2k

Member
Sep 13, 2015
30
47
Validating nodes --> Nodes validating the blocks as fast as possible.
They could have all of the blockchain or not.
They MUST download all the blockchain first to check the integrity and then build the UTXO database
They CAN prune the blockchain and keep only the last N blocks (if they have limited storage)
They can accept, check and propagate transactions.
Miners nodes are like these.

Storage nodes --> Nodes holding all the blocks of the blockchain for distribute them to other nodes
They check the blockchain but their validation speed is not critical
They download the blocks but they do not need a fast connection

Relay Nodes --> They relay txs and/or blocks.
They MUST download and check the blockchain to build a trusted UTXO database
They check the transactions incoming against their UTXO database and propagate the valid transactions
They download and check the latest blocks and serve them to other nodes.
They only keep the last N blocks of the blockchain
 

Matt Davey

New Member
Jul 24, 2016
2
8
United Kingdom
keybase.io
I spent a bit more time thinking about this and I've done a bit of a U-turn on it.

I think the idea of having *any* non-validating nodes on the network, which blindly relay blocks & transactions without checking them, presents a real threat to the network. Under such conditions, a malicious actor could essentially turn such nodes into a bot net to flood the network with invalid blocks/nodes. He knows that he can inject invalid data into the network, and these dumb nodes will happily propagate it far and wide.

So unfortunately, I think this is a non starter. Oh well!
 
  • Like
Reactions: freetrader