BU dev mini-project: FastFilter

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
(This work can be paid for by the BUIP062 (devpool) development, please contact me first to work out details)

The giga_perf branch in the github BitcoinUnlimited repo contains a class similar to a bloom filter but optimized for speed, located in fastfilter.cpp and fastfilter.h. To move this work into the dev and release branch the following needs to be done:

Please read https://en.wikipedia.org/wiki/Bloom_filter to understand the context of these requirements.

1. Add constructor where "k" and "m" (# of bits in the bloom filter array) are specified. "m" must be a power of 2.

2. Add constructor where the # of items to add to the bloom filter and the probability of a collision are specified. Calculate "k" and "m" from those as described in: https://en.wikipedia.org/wiki/Bloom_filter. Round "m" up to a power of 2.

3. Fix the "insert", "contains", and "checkAndSet" functions to handle a user-defined "k" value. As is done for k=1, efficiently pull each "hash" from a different location in the input data (nothing fancy is needed, it could be just the next few bytes) rather than running a hash function to generate it.

4. Add an option in the constructor that enables the following functionality: XOR each "hash" with a value randomly chosen at object construction time. This stops an attacker from being able to deliberately create collisions.

5. The current fastfilter uses bitwise operations like & rather than %, since "m" is a power of 2. Don't lose this efficiency!

6. Make the needed changes to CRollingFastFilter to support different sized filters (should be easy, its a 25 line class).

7. add a c++ boost test-framework-compatible unit test for CFastFilter and CRollingFastFilter. See src/test for tons of examples of unit tests.
 

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
4. Add an option in the constructor that enables the following functionality: XOR each "hash" with a value randomly chosen at object construction time. This stops an attacker from being able to deliberately create collisions.
How does that work in your fast filter case?

Isn't the fast filter just selecting a particular subset of bits out of the uint256? How does XORing that with a determined value prevent malicious collisions?
 

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
You're right, now that I really think about it. The XOR permutes the locations in the array of the bits set by each insertion. But unfortunately it does so in a way that would preserve collisions. If object A and B collided in node 1, they'd collide in node 2 even though the collisions would happen at different array locations because the array size is a known power of 2.

To reduce the chance of collisions we could XOR data from a few locations in the uint256 instead of XORing with a constant.

I originally thought that increasing the randomness would be useful because I wanted to pick bits efficiently. For example, rather then grabbing them completely at random its efficient to grab aligned contiguous groups of N bits. I wonder what would be faster for the same malicious collision strength: picking smaller contiguous bit groups, or XORing larger bit groups.

Now I'm wondering if we even need this. I'm thinking that, if we pick contiguous bits from K random locations (in a 256 bit string), wrapping around at 64 bit boundaries (for efficiency -- this allows us to load a 64 bit integer and then rotate it) that would be (pls check my reasoning) 256^K different possibilities.

Even at K=1, that's only going to collide on < .5% of the nodes.
 
  • Like
Reactions: awemany

awemany

Well-Known Member
Aug 19, 2015
1,387
5,054
Even at K=1, that's only going to collide on < .5% of the nodes.
Which might or might not be ok, depending on the attack surface:

- If it is something that a foreign node can probe easily, doing 256 probes is not going to stop any attacker.
- If it means 0.5% of nodes have temporarily just lightly higher load because an attacking miner spent POW to make it so, then that might be ok.

Have you done measurements on FastFilter performance compared to more secure alternatives? For the gigablocks initiative, have you done analysis to see what fraction is actually spent in hashing for TX index access?

To prevent malicious probing, we also don't need the full 256bit security; 32..64bit likely suffice. Maybe a hybrid approach where just enough bits come out of a more expensive hashing function would be the sweet spot?