Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

laurus-python

PyPI License: MIT

Python bindings for the Laurus search engine. Provides lexical search, vector search, and hybrid search from Python via a native Rust extension built with PyO3 and Maturin.

Features

  • Lexical Search -- Full-text search powered by an inverted index with BM25 scoring
  • Vector Search -- Approximate nearest neighbor (ANN) search using Flat, HNSW, or IVF indexes
  • Hybrid Search -- Combine lexical and vector results with fusion algorithms (RRF, WeightedSum)
  • Rich Query DSL -- Term, Phrase, Fuzzy, Wildcard, NumericRange, Geo, Boolean, Span queries
  • Text Analysis -- Tokenizers, filters, stemmers, and synonym expansion
  • Flexible Storage -- In-memory (ephemeral) or file-based (persistent) indexes
  • Pythonic API -- Clean, intuitive Python classes with full type information

Installation

pip install laurus

To build from source (requires Rust toolchain):

pip install maturin
maturin develop

Quick Start

import laurus

# Create an in-memory index
index = laurus.Index()

# Index documents
index.put_document("doc1", {"title": "Introduction to Rust", "body": "Systems programming language."})
index.put_document("doc2", {"title": "Python for Data Science", "body": "Data analysis with Python."})
index.commit()

# Search with a DSL string
results = index.search("title:rust", limit=5)
for r in results:
    print(f"[{r.id}] score={r.score:.4f}  {r.document['title']}")

# Search with a query object
results = index.search(laurus.TermQuery("body", "python"), limit=5)

Index Types

In-memory (ephemeral)

index = laurus.Index()

File-based (persistent)

schema = laurus.Schema()
schema.add_text_field("title")
schema.add_text_field("body")
schema.add_hnsw_field("embedding", dimension=384)

index = laurus.Index(path="./myindex", schema=schema)

Durability / WAL

A persistent index writes every change to a write-ahead log (WAL). By default the WAL is fsync-ed on every record, so each write is fully durable. Opt into group commit to batch fsync for higher write throughput (a crash can lose up to the last unsynced batch, like SQLite's synchronous = NORMAL):

import laurus

policy = laurus.WalSyncPolicy.group(max_records=4096, max_interval_ms=1000)
index = laurus.Index(path="./myindex", schema=schema, wal_sync_policy=policy)

index.put_document("doc1", {"title": "Hello"})
index.flush_wal()  # force a durable barrier on demand
index.commit()     # also flushes the WAL

Omit wal_sync_policy (or pass laurus.WalSyncPolicy.per_record()) to keep the default per-record durability.

Query Types

Query class Description
TermQuery(field, term) Exact term match
PhraseQuery(field, [terms]) Ordered phrase match
FuzzyQuery(field, term, max_edits) Approximate term match
WildcardQuery(field, pattern) Wildcard pattern match (*, ?)
NumericRangeQuery(field, min, max) Numeric range (int or float)
GeoDistanceQuery.within_radius(field, lat, lon, distance_m) Geo-distance radius search
GeoBoundingBoxQuery.within_bounding_box(field, min_lat, min_lon, max_lat, max_lon) Geo bounding-box search
Geo3dDistanceQuery.within_sphere(field, x, y, z, distance_m) 3D ECEF sphere search
Geo3dBoundingBoxQuery.within_box(field, min_x, min_y, min_z, max_x, max_y, max_z) 3D ECEF AABB search
Geo3dNearestQuery.k_nearest(field, x, y, z, k) 3D ECEF k-nearest neighbours
BooleanQuery(must, should, must_not) Compound boolean logic
SpanNearQuery(field, [terms], slop) Proximity / ordered span match
VectorQuery(field, vector) Pre-computed vector similarity
VectorTextQuery(field, text) Text-to-vector similarity (requires embedder)

Hybrid Search

request = laurus.SearchRequest(
    lexical_query=laurus.TermQuery("body", "rust"),
    vector_query=laurus.VectorQuery("embedding", query_vec),
    fusion=laurus.RRF(k=60.0),
    limit=10,
)
results = index.search(request)

Fusion algorithms

Class Description
RRF(k=60.0) Reciprocal Rank Fusion (rank-based, default for hybrid)
WeightedSum(lexical_weight=0.5, vector_weight=0.5) Score-normalised weighted sum

Text Analysis

syn_dict = laurus.SynonymDictionary()
syn_dict.add_synonym_group(["ml", "machine learning"])

tokenizer = laurus.WhitespaceTokenizer()
filt = laurus.SynonymGraphFilter(syn_dict, keep_original=True, boost=0.8)

tokens = tokenizer.tokenize("ml tutorial")
tokens = filt.apply(tokens)
for tok in tokens:
    print(tok.text, tok.position, tok.boost)

Examples

Usage examples are in the examples/ directory:

Example Description
quickstart.py Basic indexing and full-text search
lexical_search.py All query types (Term, Phrase, Boolean, Fuzzy, Wildcard, Range, Geo, Span)
vector_search.py Semantic similarity search with embeddings
hybrid_search.py Combining lexical and vector search with fusion
synonym_graph_filter.py Synonym expansion in the analysis pipeline
search_with_openai.py Cloud-based embeddings via OpenAI
multimodal_search.py Text-to-image and image-to-image search

Documentation

License

This project is licensed under the MIT License - see the LICENSE file for details.