Two implementations in Mojo. Uses xxHash.
A Bloom filter is a space-efficient probabilistic data structure designed to test whether an element is a member of a set. It can tell you if an item is definitely not in the set or might be in the set. The key properties:
- No false negatives: If the filter says an item is not present, it's definitely not there
- Possible false positives: If the filter says an item is present, it might be wrong
- Space efficient: Uses much less memory than storing the actual items
- Fast operations: O(k) time for both add and contains, where k is the number of hash functions
- Fixed size: Must know approximate number of items in advance
Common use cases include web crawlers (detecting already-visited URLs), databases (avoiding unnecessary disk lookups), and distributed systems (reducing network calls).
This project provides two Bloom filter variants:
- Standard Bloom Filter - Traditional implementation with enhanced double hashing
- Split Block Bloom Filter - SIMD-optimized variant for better cache performance
Inspired by:
- bits-and-blooms/bloom - High-performance Go implementation used by Milvus and Beego
- Split Block Bloom Filters - Research on cache-efficient Bloom filter design
git clone https://github.com/seif/mojo-bloomfilter.git
cd mojo-bloomfilter
Standard:
from standard import StandardBloomFilter
var bf = StandardBloomFilter.create_for_fpr(10000, 0.01)
bf.add(data)
if bf.contains(data):
print("maybe there")
Split block (SIMD):
from splitblock import SplitBlockBloomFilter
var sbf = SplitBlockBloomFilter.create_for_fpr(10000, 0.01)
sbf.add(data)
if sbf.contains(data):
print("maybe there")
-
Standard: predictable memory, smaller datasets.
-
Split block: cache efficient, large datasets, SIMD.
Both have the same interface.
create_for_fpr(n, fpr)
- for target false positive ratecreate_for_bpv(n, bpv)
- for bits per value
add(data)
- add elementcontains(data)
- check elementadd_many(items)
- add multiplecontains_many(items)
- check multiple
merge(other)
- merge filtersintersect(other)
- intersect filtersclear()
- resetserialize()
- to bytesdeserialize(data)
- from bytes
bits_set()
- count set bitsfill_ratio()
- fill percentageestimated_cardinality()
- estimate unique itemscurrent_fpr()
- actual false positive rateshould_rotate(target_fpr)
- check if needs replacement
magic run mojo test_bloomfilters.mojo
magic run mojo benchmark_comparison.mojo
BSD 2-Clause License