Python bindings for the Laurus search engine. Provides lexical search, vector search, and hybrid search from Python via a native Rust extension built with PyO3 and Maturin.
- Lexical Search -- Full-text search powered by an inverted index with BM25 scoring
- Vector Search -- Approximate nearest neighbor (ANN) search using Flat, HNSW, or IVF indexes
- Hybrid Search -- Combine lexical and vector results with fusion algorithms (RRF, WeightedSum)
- Rich Query DSL -- Term, Phrase, Fuzzy, Wildcard, NumericRange, Geo, Boolean, Span queries
- Text Analysis -- Tokenizers, filters, stemmers, and synonym expansion
- Flexible Storage -- In-memory (ephemeral) or file-based (persistent) indexes
- Pythonic API -- Clean, intuitive Python classes with full type information
pip install laurusTo build from source (requires Rust toolchain):
pip install maturin
maturin developimport laurus
# Create an in-memory index
index = laurus.Index()
# Index documents
index.put_document("doc1", {"title": "Introduction to Rust", "body": "Systems programming language."})
index.put_document("doc2", {"title": "Python for Data Science", "body": "Data analysis with Python."})
index.commit()
# Search with a DSL string
results = index.search("title:rust", limit=5)
for r in results:
print(f"[{r.id}] score={r.score:.4f} {r.document['title']}")
# Search with a query object
results = index.search(laurus.TermQuery("body", "python"), limit=5)index = laurus.Index()schema = laurus.Schema()
schema.add_text_field("title")
schema.add_text_field("body")
schema.add_hnsw_field("embedding", dimension=384)
index = laurus.Index(path="./myindex", schema=schema)A persistent index writes every change to a write-ahead log (WAL). By default
the WAL is fsync-ed on every record, so each write is fully durable. Opt into
group commit to batch fsync for higher write throughput (a crash can lose up
to the last unsynced batch, like SQLite's synchronous = NORMAL):
import laurus
policy = laurus.WalSyncPolicy.group(max_records=4096, max_interval_ms=1000)
index = laurus.Index(path="./myindex", schema=schema, wal_sync_policy=policy)
index.put_document("doc1", {"title": "Hello"})
index.flush_wal() # force a durable barrier on demand
index.commit() # also flushes the WALOmit wal_sync_policy (or pass laurus.WalSyncPolicy.per_record()) to keep
the default per-record durability.
| Query class | Description |
|---|---|
TermQuery(field, term) |
Exact term match |
PhraseQuery(field, [terms]) |
Ordered phrase match |
FuzzyQuery(field, term, max_edits) |
Approximate term match |
WildcardQuery(field, pattern) |
Wildcard pattern match (*, ?) |
NumericRangeQuery(field, min, max) |
Numeric range (int or float) |
GeoDistanceQuery.within_radius(field, lat, lon, distance_m) |
Geo-distance radius search |
GeoBoundingBoxQuery.within_bounding_box(field, min_lat, min_lon, max_lat, max_lon) |
Geo bounding-box search |
Geo3dDistanceQuery.within_sphere(field, x, y, z, distance_m) |
3D ECEF sphere search |
Geo3dBoundingBoxQuery.within_box(field, min_x, min_y, min_z, max_x, max_y, max_z) |
3D ECEF AABB search |
Geo3dNearestQuery.k_nearest(field, x, y, z, k) |
3D ECEF k-nearest neighbours |
BooleanQuery(must, should, must_not) |
Compound boolean logic |
SpanNearQuery(field, [terms], slop) |
Proximity / ordered span match |
VectorQuery(field, vector) |
Pre-computed vector similarity |
VectorTextQuery(field, text) |
Text-to-vector similarity (requires embedder) |
request = laurus.SearchRequest(
lexical_query=laurus.TermQuery("body", "rust"),
vector_query=laurus.VectorQuery("embedding", query_vec),
fusion=laurus.RRF(k=60.0),
limit=10,
)
results = index.search(request)| Class | Description |
|---|---|
RRF(k=60.0) |
Reciprocal Rank Fusion (rank-based, default for hybrid) |
WeightedSum(lexical_weight=0.5, vector_weight=0.5) |
Score-normalised weighted sum |
syn_dict = laurus.SynonymDictionary()
syn_dict.add_synonym_group(["ml", "machine learning"])
tokenizer = laurus.WhitespaceTokenizer()
filt = laurus.SynonymGraphFilter(syn_dict, keep_original=True, boost=0.8)
tokens = tokenizer.tokenize("ml tutorial")
tokens = filt.apply(tokens)
for tok in tokens:
print(tok.text, tok.position, tok.boost)Usage examples are in the examples/ directory:
| Example | Description |
|---|---|
| quickstart.py | Basic indexing and full-text search |
| lexical_search.py | All query types (Term, Phrase, Boolean, Fuzzy, Wildcard, Range, Geo, Span) |
| vector_search.py | Semantic similarity search with embeddings |
| hybrid_search.py | Combining lexical and vector search with fusion |
| synonym_graph_filter.py | Synonym expansion in the analysis pipeline |
| search_with_openai.py | Cloud-based embeddings via OpenAI |
| multimodal_search.py | Text-to-image and image-to-image search |
This project is licensed under the MIT License - see the LICENSE file for details.