194

passing tests. Zero external dependencies.

A full-text search engine built entirely from scratch in Python. Inverted indexing, BM25 ranking, Porter stemming, recursive descent query parsing, faceted search, and persistent storage — all from the standard library.

View source →
16
Core modules
0.26ms
Avg query time
~4,400
Lines of code
0
Dependencies

Quick start

from searchlite import SearchEngine, Schema, TextField, KeywordField

engine = SearchEngine(schema=Schema(
    title=TextField(boost=2.0),
    body=TextField(),
    tags=KeywordField(faceted=True),
))

engine.add({
    "title": "Building Data Pipelines",
    "body": "A guide to ETL with Python and Apache Kafka",
    "tags": ["data-engineering", "python"],
})

# BM25-ranked results with highlighting
results = engine.search("python AND kafka")
for hit in results:
    print(hit.score, hit.highlight("body"))

What's under the hood

Inverted Index

Term → document mapping with position tracking, term frequencies, and field-level storage. Add, remove, and serialize document sets.

BM25 & TF-IDF Scoring

Okapi BM25 with tunable k1 and b parameters. Length normalization, field boosting, and diminishing-return term frequency saturation.

Porter Stemmer

All five steps of the original Porter stemming algorithm. "running" → "run", "connected" → "connect", "generalization" → "gener".

Query Parser

Recursive descent parser handling AND, OR, NOT, phrase queries, wildcards, field-specific search, boost operators, and parenthesized grouping.

Faceted Search

Multi-field facet counting with value filtering. Search for "python", get back tag distributions and filter by category — like Elasticsearch aggregations.

Persistent Storage

JSON-based segment files with metadata tracking and compaction. Save your index to disk, reload it later. Context-manager support for clean shutdown.

Query syntax

QueryWhat it does
python dataImplicit AND — both terms required
python OR javaEither term matches
NOT pythonExclude documents containing "python"
"machine learning"Exact phrase with positional matching
title:pythonSearch only the title field
title:"data science"Phrase in a specific field
pyth*Prefix wildcard expansion
python^2.0Boost a term's weight
(python OR java) AND dataGrouped boolean logic

Architecture

API Layer

SearchEngine

High-level interface — add, search, commit, stats

Execution

Searcher

Query → postings → score → rank

Ranking

BM25 Scorer

TF saturation, IDF, length norm

Parsing

Query Parser

Recursive descent with precedence

Display

Highlighter

Best-passage snippet extraction

Core

Inverted Index

Posting lists, positions, term lookup

Analysis

Analyzer

Tokenize → normalize → stem → filter

Schema

Field Types

Text, Keyword, Numeric definitions

Persistence

Storage

JSON segments with compaction