Cuckoo Filter
Similar to BloomFilter with major advantages
- Supports adding and removing items dynamically
- Provides higher lookup performance than traditional Bloom filters, even when close to full
- It uses less space than Bloom filters in many practical applications, if the target false positive rate is less than 3%
Cuckoo filter internally uses optimized Cuckoo Hashing technique called partial-key cuckoo hashing
Partial Key cuckoo hashing
To make cuckoo filters highly space efficient, we use a multi-way associative cuckoo hash table to provide high-speed lookup and high table occupancy (e.g., 95% hash table slots filled); to further reduce the hash table size, each item is first hashed into a constant-sized fingerprint before inserted into this hash table.
Insert
// Insert(x)
f = fingerprint(x)
i1 = hash(x)
i2 = i1 ^ hash(f) // XOR operation
if bucket[i1] or bucket[i2] has an empty entry then
add f to that bucket;
return Success;
// Need to relocate existing items
i = randomly pick i1 or i2
for n = 0; n < MaxNumKicks; n++ do
randomly select an entry e from bucket[i]
swap f and the fingerprint stored in entry e
i = i ^ hash(f)
if bucket[i] has an empty entry then
add f to bucket[i]
return Success
// Hashtable is considered full
return Failure
Lookup
// Lookup(x)
f = fingerprint(x)
i1 = hash(x)
i2 = i1 ^ hash(f) // XOR operation
if bucket[i1] or bucket[i2] has f then
return True
return False
Delete
// Delete(x)
f = fingerprint(x)
i1 = hash(x)
i2 = i1 ^ hash(f) // XOR operation
if bucket[i1] or bucket[i2] has f then
remove a copy of f from this bucket
return True
return False