Proposal for your critical review, I'm not formally submitting this yet. Please respond with any comments you may have.
Part 1:
INTRODUCTION
If a hard fork becomes possible, we need technologies to be ready to take advantage of the opportunity. Today, the network is hobbled because every full node must receive and process every block and transaction. If M is the number of transactions, then every node's bandwidth, CPU and memory requirements are O(M). Since, every node must process all the data flowing through the entire network, the speed of the network is necessarily limited to the speed of the average node. This BUIP proposes a technology that will allow the Bitcoin network to scale to any level of use, by enabling O(log(M)) scaling by splitting (sharding) the network into regions where data does not need to be shared.
Current sharding proposals fail because they conflate the physicality of a node with a single trusted entity, and assume that every node should be capable of mining. In other words, in today's Bitcoin a node trusts only itself. This proposal separates these concepts into a physical machine (a node) with O(log(M)) requirements and a mining "trust entity" with O(M) requirements. The outcome of a breach of trust is not problematic to the network as a whole -- it is just punishing to the trust entity since the effect is that an invalid block is mined with the resulting loss of revenue.
If a node is not mining, there is no need to be part of a larger trust entity. Therefore, this proposal is no worse than Bitcoin today with respect to scalability of the mining trust entity, and much better for individual non-mining nodes and individual mining hardware.
This proposal does not address how trust entities can be formed because the formation mechanism does not affect the proper execution of this algorithm. The simplest mechanism is that a single individual or corporation can run multiple nodes. While some people may feel that this will encourage centralization of the network, this is already the reality of Bitcoin mining and mining pools today. And so it is foolish to limit Bitcoin's growth in an attempt to win a battle that has already been lost. One cannot fail-to-grow your way to success. However, other trust creation mechanisms are possible, such as mining at lower difficulty or block revenue sharing. Exploration and game-theoretic analysis of these organization mechanisms may result in algorithms that help decentralize the transaction validation and mining network by allowing trust entities to form by mutual self-interest.
Finally, arguably the most important aspect of Bitcoin's trustless nature is permissionless partcipitation. The property is preserved. If an individual cannot afford the hardware necessary to validate the entire network and mine blocks he/she can form a group and do so without the permission of the other participants in the network.
ARCHITECTURE
A new block header format is created that contains the SHA-256 hash of up to 8 shard-blocks. Any transaction may be found in the root block, but shard-block 0 may only contain transactions that contain addresses that (in binary) begins with bX000, shard-block 1's transactions' addresses begins with bX001, 2 bX010, and so on, where X is the prefix 1 for normal addresses and 3 for multisig addresses. This is simply a way of sharding the data, so it makes sense to ignore the common prefix in addresses. This proposal covers common address formats, but other address formats would have their own sharding algorithm which would be similar to what is described here.
Shard-blocks may contain shard-shard-blocks (shard-2-block) with a further specialization of address, recursively. Blocks MAY contain no shard-blocks at all, and do not need to contain shard-blocks across the entire address range. If an attacker creates a gigantic tree of shard-chains with few transactions, this will slow down validation so the block risks being orphaned. An option is to require that parent blocks be at least 75% full (say). However, this approach creates a parallelism problem where a machine working on a child shardchain cannot proceed until it knows which transactions the parent shardchain has used to fill its block. With no parent block fill requirement, transactions can be delivered to the machine working on a child shardchain as they come in.
When a transaction contains addresses with the same prefix, the transaction can be located within the shard-block designated for that prefix, or within any parent block.
Therefore paying across address prefixes requires a transaction in the common parent, which could ultimately be the root block for transactions with no common prefix.
Transactions may be sorted in the merkle tree (to implement fraud proofs) and the transaction format may be flex transactions depending on the status of those BUIPs. Anything not covered will default to the root block to ensure that this proposal can handle anything unanticipated in the submitted transactions.
P2P Network
Nodes will need to discover and keep track of where to access data for particular shardchains. This resource discovery problem in a peer-to-peer network is a well known problem solved by distribute hash table protocols like Chord, Tapestry, and Pastry. However, the number of shardchains is anticipated to be so much smaller than that of nodes and node (for the foreseeable future) that a simpler "degenerate" approach of every node simply storing the shardchain metadata of every other node it knows about is possible. A simple flood query protocol can be used to allow a node to request sources for shardchains.
Nodes will only relay transactions only to nodes that track the relevant shardchain, unless the node has no data about that chain. In that case, the node will relay the transaction to all its connected nodes.
Part 1:
INTRODUCTION
If a hard fork becomes possible, we need technologies to be ready to take advantage of the opportunity. Today, the network is hobbled because every full node must receive and process every block and transaction. If M is the number of transactions, then every node's bandwidth, CPU and memory requirements are O(M). Since, every node must process all the data flowing through the entire network, the speed of the network is necessarily limited to the speed of the average node. This BUIP proposes a technology that will allow the Bitcoin network to scale to any level of use, by enabling O(log(M)) scaling by splitting (sharding) the network into regions where data does not need to be shared.
Current sharding proposals fail because they conflate the physicality of a node with a single trusted entity, and assume that every node should be capable of mining. In other words, in today's Bitcoin a node trusts only itself. This proposal separates these concepts into a physical machine (a node) with O(log(M)) requirements and a mining "trust entity" with O(M) requirements. The outcome of a breach of trust is not problematic to the network as a whole -- it is just punishing to the trust entity since the effect is that an invalid block is mined with the resulting loss of revenue.
If a node is not mining, there is no need to be part of a larger trust entity. Therefore, this proposal is no worse than Bitcoin today with respect to scalability of the mining trust entity, and much better for individual non-mining nodes and individual mining hardware.
This proposal does not address how trust entities can be formed because the formation mechanism does not affect the proper execution of this algorithm. The simplest mechanism is that a single individual or corporation can run multiple nodes. While some people may feel that this will encourage centralization of the network, this is already the reality of Bitcoin mining and mining pools today. And so it is foolish to limit Bitcoin's growth in an attempt to win a battle that has already been lost. One cannot fail-to-grow your way to success. However, other trust creation mechanisms are possible, such as mining at lower difficulty or block revenue sharing. Exploration and game-theoretic analysis of these organization mechanisms may result in algorithms that help decentralize the transaction validation and mining network by allowing trust entities to form by mutual self-interest.
Finally, arguably the most important aspect of Bitcoin's trustless nature is permissionless partcipitation. The property is preserved. If an individual cannot afford the hardware necessary to validate the entire network and mine blocks he/she can form a group and do so without the permission of the other participants in the network.
ARCHITECTURE
A new block header format is created that contains the SHA-256 hash of up to 8 shard-blocks. Any transaction may be found in the root block, but shard-block 0 may only contain transactions that contain addresses that (in binary) begins with bX000, shard-block 1's transactions' addresses begins with bX001, 2 bX010, and so on, where X is the prefix 1 for normal addresses and 3 for multisig addresses. This is simply a way of sharding the data, so it makes sense to ignore the common prefix in addresses. This proposal covers common address formats, but other address formats would have their own sharding algorithm which would be similar to what is described here.
Shard-blocks may contain shard-shard-blocks (shard-2-block) with a further specialization of address, recursively. Blocks MAY contain no shard-blocks at all, and do not need to contain shard-blocks across the entire address range. If an attacker creates a gigantic tree of shard-chains with few transactions, this will slow down validation so the block risks being orphaned. An option is to require that parent blocks be at least 75% full (say). However, this approach creates a parallelism problem where a machine working on a child shardchain cannot proceed until it knows which transactions the parent shardchain has used to fill its block. With no parent block fill requirement, transactions can be delivered to the machine working on a child shardchain as they come in.
When a transaction contains addresses with the same prefix, the transaction can be located within the shard-block designated for that prefix, or within any parent block.
Therefore paying across address prefixes requires a transaction in the common parent, which could ultimately be the root block for transactions with no common prefix.
Transactions may be sorted in the merkle tree (to implement fraud proofs) and the transaction format may be flex transactions depending on the status of those BUIPs. Anything not covered will default to the root block to ensure that this proposal can handle anything unanticipated in the submitted transactions.
P2P Network
Nodes will need to discover and keep track of where to access data for particular shardchains. This resource discovery problem in a peer-to-peer network is a well known problem solved by distribute hash table protocols like Chord, Tapestry, and Pastry. However, the number of shardchains is anticipated to be so much smaller than that of nodes and node (for the foreseeable future) that a simpler "degenerate" approach of every node simply storing the shardchain metadata of every other node it knows about is possible. A simple flood query protocol can be used to allow a node to request sources for shardchains.
Nodes will only relay transactions only to nodes that track the relevant shardchain, unless the node has no data about that chain. In that case, the node will relay the transaction to all its connected nodes.