Bloom Filter
Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set
Properties
- False positives are possible, but false negatives are not in this structure
- Elements can be added to the set, but not removed
- The more elements that are added to the set, the larger the probability of false positives
Setup
Both insertions and membership queries are performed in constant time.
A Bloom filter is a bit vector of bits, with k independent hash functions that map each key in to the set .
- Initially all m bits of B are set to 0.
- Insert into Bloom filter, Compute and set .
- Query, Compute . If then answer Yes, else answer No.
Analysis
False positive rate probability
Probability that one hash do not set a given bit is where is number of bits
Probability that a bit is not set by any of hash functions for a given input is
Probability that a bit is not set after inserting n elements is
Probability of false positive means specific set of k bits should be equal to 1 i.e,
Applying log on both sides
To minimize false positive rate with respect to k
derivative is zero when
Compound error probability after merging
Compound probability after merging n bloom filters is
Bloom filter tricks
- Union of bloom filters of same size
- Intersection of bloom filters of same size
- Shrink bloom filter size if the number of bits in bloom filter is power of 2, This can be done by ignoring least significant bit after hash
Limitations
A limitation of standard Bloom filters is that one cannot remove existing items without rebuilding the entire filter (or possibly introducing generally less desirable false negatives).
Counting Bloom filters
In a counting Bloom filter, each entry in the Bloom filter is not a single bit but rather a small counter. When an item is inserted, the corresponding counters are incremented; when an item is deleted, the corresponding counters are decremented. But they generally use 3–4x space to retain the same false positive rate as a space-optimized Bloom filter.