And there is very interesting maths in this problem. We know that some of the peers cannot accept inbound, and only open outbound. So for a certain node, from the world outside of the machine in which the node is running, it can only count as a "inbound" of some peers outside of that machine, but cannot count as "outbound" of any peer. We can call such kind of nodes as "nodes behind wall", and other peers who can accept both inbound and outbound as "open nodes".
Assumption
Let us assume all the nodes will initiate only 8 outbound.
Analysis
If in the world, all the peers is open nodes, if we can get all the statistics from the all the nodes, and the total sum of inbound, C_in, will equal the sum of outbound, C_out
C_in = C_out
Then, there is some nodes behind the wall. If we only count the open nodes' inbound and outbound, we will find that total sum of inbound starts to larger than outbound.
C_in > C_out
The more nodes behind wall compared with open nodes, the larger the difference between C_in and C_out.
C_in : C_out = (total number of nodes) : (number of open nodes)
We don't have the way to get all the statistics of to get a real C_in and C_out. However, we can imply from the sample data.
Conclusion:
We can use this method to estimate how many full nodes are behind the wall. For example, according my experience, usually there are usally 20-30 total connections per node, and there are only 8 outbound per node.
Right now there are 5.2K full nodes on the network that is "open nodes", (
https://coin.dance/nodes/all)
And we can estimate that there must be around 20/8*5.2K=13K to 30/8*5.2K=19.5K full nodes, and 7.8K~14.3K are nodes behind walls.
However, every nodes that generate 8*n outbound will reduce n-1 of the number of nodes behind wall estimation.
Discussion:
Some of the nodes on the network will try to initiate thousands of outbound. This will disrupt the assumption 1. The more of such kind of nodes, the higher our estimation than the number in fact. However, if you have lots of peers on the network, you can easily find that some of the peers are very repetitively showing up in different peers. You can recognize it by IP or by the client name. You can delete such kind of invalid data.
The observation nodes have impacts on the estimation results. The higher the observation nodes, the wider the observation nodes distributed in the IP address, the more accurate the result is. But as long as the total number of observation peers is around tens of it, the impact to the result is minimal, and it is easily to kickout the data.