@Peter R I just had a thought which I want to run by our resident theorist.
Let's first run through the mining algorithm:
Basically miners first receive a block header and start mining a 0-tx block on top of it. They can't add any txns in because they don't know what txns were "used" by the block they have not yet received.
Next they start receiving and validating the block. When that is complete they start mining a non-0 tx block on top of it to get as many txn fees as possible.
Therefore the average tx/second is essentially defined by the miner's network and validation capacity.
No limits required. If a miner produces a huge block, there will be a few 0-tx blocks after it if that huge block exceeds the average capacity of the network to process it. As miners upgrade their network or sig validation infrastructure, the bitcoin network as a whole will "naturally" produce fewer 0-tx blocks, resulting in a higher throughput!!!
This seems like a much more powerful mechanism curtailing the average bandwidth as compared to the fork mechanism. But the network as a whole (the users) do not care (much) whether we had 1 block with 10k tx and then 2 with 0, or 3 with 3.3k tx. In fact, the former is "better" because more txns get more confirmations sooner and the mempool clears out.
It seems like we should be able to put some math behind that and also look at past 0-txn block history to see the effect happening "live".