Skip to content

CognitiveCodeAI/rag-main-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

60 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NPR - Near-Perfect RAG

A production-grade Retrieval-Augmented Generation (RAG) system designed for high-accuracy document question answering with evidence-based citations.

Developed by Larry Stewart at Cognitive Code (cognitiveCode.ai).

Overview

NPR (Near-Perfect RAG) is a full-stack RAG system that retrieves relevant document evidence and generates answers with explicit citations. The system prioritizes:

  • Evidence-first answers: Every claim is grounded in retrieved document evidence
  • Citation completeness: All factual claims include source citations with page numbers
  • Explicit abstention: When insufficient evidence exists, the system asks clarifying questions or abstains rather than guessing
  • Reproducibility: Every response produces a replayable trace for debugging and auditing

Architecture

                                    NPR RAG System
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                           ONLINE PLANE                               β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β”‚  β”‚  Query   │───▢│ Planner  │───▢│ Retrieve │───▢│   Generate   β”‚  β”‚
    β”‚  β”‚ Gateway  β”‚    β”‚          β”‚    β”‚ & Rerank β”‚    β”‚ (with cites) β”‚  β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                   β–Ό                   β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚PostgreSQLβ”‚       β”‚  Milvus  β”‚       β”‚  MinIO   β”‚
              β”‚ (Graph)  β”‚       β”‚ (Vectors)β”‚       β”‚ (Storage)β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                          OFFLINE PLANE                               β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β”‚  β”‚  Ingest  │───▢│  Parse   │───▢│  Chunk   │───▢│    Embed     β”‚  β”‚
    β”‚  β”‚ Document β”‚    β”‚ & Layout β”‚    β”‚ & Index  β”‚    β”‚  (OpenAI)    β”‚  β”‚
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Component Technology
Backend API FastAPI (Python 3.11+)
Frontend Next.js
Vector Database Milvus
Relational DB PostgreSQL
Object Storage MinIO (S3-compatible)
Task Queue Celery + Redis
Embeddings OpenAI text-embedding-3-large
LLM (Chat) Configurable (Ollama/OpenAI)

Quick Start

git clone <repository-url> rag-system
cd rag-system
./dev init
./dev up

./dev init validates prerequisites (Docker, Python 3.10+, Node 18+, npm) and creates/syncs backend/.env from backend/.env.example without overwriting existing values.

./dev up runs first-time bootstrap when needed, starts local infrastructure, then starts backend, frontend, and celery.

Open:

Useful commands:

  • ./dev status
  • ./dev logs app or ./dev logs infra
  • ./dev monitor (foreground) or ./dev monitor --daemon (background)
  • ./dev doctor
  • ./dev migrate
  • ./dev seed
  • ./dev test
  • ./dev reset --yes (or ./dev reset --volumes --yes to wipe service data)
  • ./dev down

Need deep setup/troubleshooting details? See SETUP.md.

Troubleshooting

  • If ./dev up fails: run ./dev doctor, then ./dev logs infra.
  • If API/UI is unreachable: run ./dev status, then ./dev logs app.
  • If migrations fail: run ./dev migrate and review backend output.
  • If startup state is corrupted: run ./dev reset --yes (or ./dev reset --volumes --yes to wipe data), then ./dev init and ./dev up.

Optional Active Monitor

The active monitor is opt-in and safe-by-default:

  • Off by default (MONITOR_ENABLED=false)
  • Observe-only unless MONITOR_MODE=heal
  • Supports dry-run (MONITOR_DRY_RUN=true) and circuit breaker safeguards

Run it via ./dev:

# Foreground monitor (Ctrl+C to stop)
MONITOR_ENABLED=true MONITOR_MODE=observe ./dev monitor

# Background daemon monitor
MONITOR_ENABLED=true MONITOR_MODE=heal ./dev monitor --daemon

# Inspect monitor status/logs
./dev monitor --status
./dev monitor --stop
./dev logs monitor
./dev status

Monitor environment variables:

  • MONITOR_ENABLED = true|false
  • MONITOR_MODE = observe|heal
  • MONITOR_DRY_RUN = true|false
  • MONITOR_INTERVAL_SECONDS
  • MONITOR_MAX_RETRIES
  • MONITOR_BACKOFF_SECONDS
  • MONITOR_CIRCUIT_BREAKER_THRESHOLD

Monitor outputs:

  • Structured incident log: logs/monitor.jsonl
  • State/dedupe file: .monitor_state.json

API Endpoints

Once running, access:

Key Endpoints

Endpoint Method Description
/v1/ingest/document POST Upload and process documents
/v1/qa/ask POST Ask questions about documents
/v1/retrieve/vector POST Vector search for relevant chunks
/api/query POST Query endpoint (legacy)
/health GET System health status

Project Structure

rag/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ db/          # Database models & sessions
β”‚   β”‚   β”œβ”€β”€ graph/       # Document graph processing
β”‚   β”‚   β”œβ”€β”€ llm/         # LLM clients (OpenAI)
β”‚   β”‚   β”œβ”€β”€ qa/          # Question answering pipeline
β”‚   β”‚   β”œβ”€β”€ routes/      # FastAPI routes
β”‚   β”‚   β”œβ”€β”€ tasks/       # Celery background tasks
β”‚   β”‚   └── vectordb/    # Milvus vector operations
β”‚   β”œβ”€β”€ scripts/         # Setup and utility scripts
β”‚   β”œβ”€β”€ tests/           # Test suite
β”‚   └── docs/            # Documentation
β”œβ”€β”€ frontend/            # Next.js frontend
β”œβ”€β”€ contracts/           # JSON schema contracts
β”œβ”€β”€ docker-compose.yml   # Infrastructure setup
└── lighthouse.md        # System specification

Documentation

Configuration

See backend/.env.example for all configuration options.

Key settings:

Variable Required Description
OPENAI_API_KEY Yes OpenAI API key for embeddings
DB_PASSWORD Yes PostgreSQL password
MINIO_SECRET_KEY Yes MinIO secret key
DEBUG No Enable debug mode (default: false)

Testing

cd backend

# Run all tests
pytest

# Run specific test suite
pytest tests/qa/ -v

# Run with coverage
pytest --cov=app tests/

Development

Adding a New Document Type

  1. Add parser in backend/app/graph/
  2. Update chunking logic in backend/app/graph/chunker.py
  3. Add tests in backend/tests/

Running Evaluations

cd backend
python tests/eval/run_qa_eval.py --contract tests/eval/benchmark_contract.json

Benchmark runs are contract-gated. If dataset hashes, mode settings, or benchmark-critical flags drift from backend/tests/eval/benchmark_contract.json, the run exits before execution.

To refresh benchmark contract hashes/counts after intentional benchmark file changes:

cd backend
python tests/eval/update_benchmark_contract.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

See CONTRIBUTING.md for guidelines.