A production-grade Retrieval-Augmented Generation (RAG) system designed for high-accuracy document question answering with evidence-based citations.
Developed by Larry Stewart at Cognitive Code (cognitiveCode.ai).
NPR (Near-Perfect RAG) is a full-stack RAG system that retrieves relevant document evidence and generates answers with explicit citations. The system prioritizes:
- Evidence-first answers: Every claim is grounded in retrieved document evidence
- Citation completeness: All factual claims include source citations with page numbers
- Explicit abstention: When insufficient evidence exists, the system asks clarifying questions or abstains rather than guessing
- Reproducibility: Every response produces a replayable trace for debugging and auditing
NPR RAG System
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ONLINE PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
β β Query βββββΆβ Planner βββββΆβ Retrieve βββββΆβ Generate β β
β β Gateway β β β β & Rerank β β (with cites) β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
βPostgreSQLβ β Milvus β β MinIO β
β (Graph) β β (Vectors)β β (Storage)β
ββββββββββββ ββββββββββββ ββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OFFLINE PLANE β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
β β Ingest βββββΆβ Parse βββββΆβ Chunk βββββΆβ Embed β β
β β Document β β & Layout β β & Index β β (OpenAI) β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Technology |
|---|---|
| Backend API | FastAPI (Python 3.11+) |
| Frontend | Next.js |
| Vector Database | Milvus |
| Relational DB | PostgreSQL |
| Object Storage | MinIO (S3-compatible) |
| Task Queue | Celery + Redis |
| Embeddings | OpenAI text-embedding-3-large |
| LLM (Chat) | Configurable (Ollama/OpenAI) |
git clone <repository-url> rag-system
cd rag-system
./dev init
./dev up./dev init validates prerequisites (Docker, Python 3.10+, Node 18+, npm) and creates/syncs backend/.env from backend/.env.example without overwriting existing values.
./dev up runs first-time bootstrap when needed, starts local infrastructure, then starts backend, frontend, and celery.
Open:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API docs: http://localhost:8000/docs
Useful commands:
./dev status./dev logs appor./dev logs infra./dev monitor(foreground) or./dev monitor --daemon(background)./dev doctor./dev migrate./dev seed./dev test./dev reset --yes(or./dev reset --volumes --yesto wipe service data)./dev down
Need deep setup/troubleshooting details? See SETUP.md.
- If
./dev upfails: run./dev doctor, then./dev logs infra. - If API/UI is unreachable: run
./dev status, then./dev logs app. - If migrations fail: run
./dev migrateand review backend output. - If startup state is corrupted: run
./dev reset --yes(or./dev reset --volumes --yesto wipe data), then./dev initand./dev up.
The active monitor is opt-in and safe-by-default:
- Off by default (
MONITOR_ENABLED=false) - Observe-only unless
MONITOR_MODE=heal - Supports dry-run (
MONITOR_DRY_RUN=true) and circuit breaker safeguards
Run it via ./dev:
# Foreground monitor (Ctrl+C to stop)
MONITOR_ENABLED=true MONITOR_MODE=observe ./dev monitor
# Background daemon monitor
MONITOR_ENABLED=true MONITOR_MODE=heal ./dev monitor --daemon
# Inspect monitor status/logs
./dev monitor --status
./dev monitor --stop
./dev logs monitor
./dev statusMonitor environment variables:
MONITOR_ENABLED=true|falseMONITOR_MODE=observe|healMONITOR_DRY_RUN=true|falseMONITOR_INTERVAL_SECONDSMONITOR_MAX_RETRIESMONITOR_BACKOFF_SECONDSMONITOR_CIRCUIT_BREAKER_THRESHOLD
Monitor outputs:
- Structured incident log:
logs/monitor.jsonl - State/dedupe file:
.monitor_state.json
Once running, access:
- API Docs: http://localhost:8000/docs (Swagger UI)
- Health Check: http://localhost:8000/health
- Frontend: http://localhost:3000
| Endpoint | Method | Description |
|---|---|---|
/v1/ingest/document |
POST | Upload and process documents |
/v1/qa/ask |
POST | Ask questions about documents |
/v1/retrieve/vector |
POST | Vector search for relevant chunks |
/api/query |
POST | Query endpoint (legacy) |
/health |
GET | System health status |
rag/
βββ backend/
β βββ app/
β β βββ db/ # Database models & sessions
β β βββ graph/ # Document graph processing
β β βββ llm/ # LLM clients (OpenAI)
β β βββ qa/ # Question answering pipeline
β β βββ routes/ # FastAPI routes
β β βββ tasks/ # Celery background tasks
β β βββ vectordb/ # Milvus vector operations
β βββ scripts/ # Setup and utility scripts
β βββ tests/ # Test suite
β βββ docs/ # Documentation
βββ frontend/ # Next.js frontend
βββ contracts/ # JSON schema contracts
βββ docker-compose.yml # Infrastructure setup
βββ lighthouse.md # System specification
- Setup Guide - Complete setup instructions with troubleshooting
- Quick Start Guide - Condensed setup steps
- Deployment Guide - Production deployment
- System Specification - Full architecture spec
- Prompting Guide - Prompt engineering practices
See backend/.env.example for all configuration options.
Key settings:
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for embeddings |
DB_PASSWORD |
Yes | PostgreSQL password |
MINIO_SECRET_KEY |
Yes | MinIO secret key |
DEBUG |
No | Enable debug mode (default: false) |
cd backend
# Run all tests
pytest
# Run specific test suite
pytest tests/qa/ -v
# Run with coverage
pytest --cov=app tests/- Add parser in
backend/app/graph/ - Update chunking logic in
backend/app/graph/chunker.py - Add tests in
backend/tests/
cd backend
python tests/eval/run_qa_eval.py --contract tests/eval/benchmark_contract.jsonBenchmark runs are contract-gated. If dataset hashes, mode settings, or benchmark-critical flags drift from
backend/tests/eval/benchmark_contract.json, the run exits before execution.
To refresh benchmark contract hashes/counts after intentional benchmark file changes:
cd backend
python tests/eval/update_benchmark_contract.pyThis project is licensed under the MIT License - see the LICENSE file for details.
See CONTRIBUTING.md for guidelines.