Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
src	src
tests	tests
.gitignore	.gitignore
Cargo.toml	Cargo.toml
README.md	README.md
README_ja.md	README_ja.md
pyproject.toml	pyproject.toml

laurus-python

Python bindings for the Laurus search engine. Provides lexical search, vector search, and hybrid search from Python via a native Rust extension built with PyO3 and Maturin.

Features

Lexical Search -- Full-text search powered by an inverted index with BM25 scoring
Vector Search -- Approximate nearest neighbor (ANN) search using Flat, HNSW, or IVF indexes
Hybrid Search -- Combine lexical and vector results with fusion algorithms (RRF, WeightedSum)
Rich Query DSL -- Term, Phrase, Fuzzy, Wildcard, NumericRange, Geo, Boolean, Span queries
Text Analysis -- Tokenizers, filters, stemmers, and synonym expansion
Flexible Storage -- In-memory (ephemeral) or file-based (persistent) indexes
Pythonic API -- Clean, intuitive Python classes with full type information

Installation

pip install laurus

To build from source (requires Rust toolchain):

pip install maturin
maturin develop

Quick Start

import laurus

# Create an in-memory index
index = laurus.Index()

# Index documents
index.put_document("doc1", {"title": "Introduction to Rust", "body": "Systems programming language."})
index.put_document("doc2", {"title": "Python for Data Science", "body": "Data analysis with Python."})
index.commit()

# Search with a DSL string
results = index.search("title:rust", limit=5)
for r in results:
    print(f"[{r.id}] score={r.score:.4f}  {r.document['title']}")

# Search with a query object
results = index.search(laurus.TermQuery("body", "python"), limit=5)

Index Types

In-memory (ephemeral)

index = laurus.Index()

File-based (persistent)

schema = laurus.Schema()
schema.add_text_field("title")
schema.add_text_field("body")
schema.add_hnsw_field("embedding", dimension=384)

index = laurus.Index(path="./myindex", schema=schema)

Durability / WAL

A persistent index writes every change to a write-ahead log (WAL). By default the WAL is fsync-ed on every record, so each write is fully durable. Opt into group commit to batch fsync for higher write throughput (a crash can lose up to the last unsynced batch, like SQLite's synchronous = NORMAL):

import laurus

policy = laurus.WalSyncPolicy.group(max_records=4096, max_interval_ms=1000)
index = laurus.Index(path="./myindex", schema=schema, wal_sync_policy=policy)

index.put_document("doc1", {"title": "Hello"})
index.flush_wal()  # force a durable barrier on demand
index.commit()     # also flushes the WAL

Omit wal_sync_policy (or pass laurus.WalSyncPolicy.per_record()) to keep the default per-record durability.

Query Types

Query class	Description
`TermQuery(field, term)`	Exact term match
`PhraseQuery(field, [terms])`	Ordered phrase match
`FuzzyQuery(field, term, max_edits)`	Approximate term match
`WildcardQuery(field, pattern)`	Wildcard pattern match (`*`, `?`)
`NumericRangeQuery(field, min, max)`	Numeric range (int or float)
`GeoDistanceQuery.within_radius(field, lat, lon, distance_m)`	Geo-distance radius search
`GeoBoundingBoxQuery.within_bounding_box(field, min_lat, min_lon, max_lat, max_lon)`	Geo bounding-box search
`Geo3dDistanceQuery.within_sphere(field, x, y, z, distance_m)`	3D ECEF sphere search
`Geo3dBoundingBoxQuery.within_box(field, min_x, min_y, min_z, max_x, max_y, max_z)`	3D ECEF AABB search
`Geo3dNearestQuery.k_nearest(field, x, y, z, k)`	3D ECEF k-nearest neighbours
`BooleanQuery(must, should, must_not)`	Compound boolean logic
`SpanNearQuery(field, [terms], slop)`	Proximity / ordered span match
`VectorQuery(field, vector)`	Pre-computed vector similarity
`VectorTextQuery(field, text)`	Text-to-vector similarity (requires embedder)

Hybrid Search

request = laurus.SearchRequest(
    lexical_query=laurus.TermQuery("body", "rust"),
    vector_query=laurus.VectorQuery("embedding", query_vec),
    fusion=laurus.RRF(k=60.0),
    limit=10,
)
results = index.search(request)

Fusion algorithms

Class	Description
`RRF(k=60.0)`	Reciprocal Rank Fusion (rank-based, default for hybrid)
`WeightedSum(lexical_weight=0.5, vector_weight=0.5)`	Score-normalised weighted sum

Text Analysis

syn_dict = laurus.SynonymDictionary()
syn_dict.add_synonym_group(["ml", "machine learning"])

tokenizer = laurus.WhitespaceTokenizer()
filt = laurus.SynonymGraphFilter(syn_dict, keep_original=True, boost=0.8)

tokens = tokenizer.tokenize("ml tutorial")
tokens = filt.apply(tokens)
for tok in tokens:
    print(tok.text, tok.position, tok.boost)

Examples

Usage examples are in the examples/ directory:

Example	Description
quickstart.py	Basic indexing and full-text search
lexical_search.py	All query types (Term, Phrase, Boolean, Fuzzy, Wildcard, Range, Geo, Span)
vector_search.py	Semantic similarity search with embeddings
hybrid_search.py	Combining lexical and vector search with fusion
synonym_graph_filter.py	Synonym expansion in the analysis pipeline
search_with_openai.py	Cloud-based embeddings via OpenAI
multimodal_search.py	Text-to-image and image-to-image search

Documentation

Python Binding Guide

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

laurus-python

Features

Installation

Quick Start

Index Types

In-memory (ephemeral)

File-based (persistent)

Durability / WAL

Query Types

Hybrid Search

Fusion algorithms

Text Analysis

Examples

Documentation

License

FilesExpand file tree

laurus-python

Directory actions

More options

Directory actions

More options

Latest commit

History

laurus-python

Folders and files

parent directory

README.md

laurus-python

Features

Installation

Quick Start

Index Types

In-memory (ephemeral)

File-based (persistent)

Durability / WAL

Query Types

Hybrid Search

Fusion algorithms

Text Analysis

Examples

Documentation

License