GitHelp is a conversational assistant for querying a software project's documentation, configuration files, repository structure, and Python API documentation in natural language.
The initial use case is the MMORE repository. The architecture is project-oriented, so GitHelp can also build a corpus for another Python repository selected locally or cloned from a public GitHub URL.
Full documentation: GitHelp documentation
Direct access to the interface (from EPFL or VPN-connected): Interface
GitHelp is designed for questions such as:
- How do I install, configure, or run the target project?
- How do I build an MMORE index?
- Where is a specific module, class, function, or config implemented?
- What is the signature of a function?
- What does an example configuration file look like?
- Which retrieved sources support this answer?
Answers are source-grounded: GitHelp retrieves project documents first, builds a prompt from those sources, and then optionally calls a local LLM provider.
GitHelp turns a target repository into a structured DocumentRecord corpus.
The simple backend searches this corpus directly, while the MMORE workflow
exports and indexes it before retrieval. Retrieved records are then used either
by the extractive answerer or as grounded context for the local LLM.
Main package layout:
src/githelp/
config.py typed configuration helpers
data_models.py DocumentRecord model
corpus/ corpus construction
loaders/ Markdown, YAML, repository structure loaders
extractors/ Python docstring/signature extraction
indexing/ MMORE JSONL export and index wrapper
retrieval/ simple and MMORE retrieval backends
rag/ answering, prompting, LLM providers
project_profiles/ project-specific query/reranking/direct-answer logic
projects/ project setup and GitHub loading workflows
utils/ shared filesystem paths
app/ Streamlit interface
scripts/ command-line workflows and debugging tools
docs/ Sphinx documentation
tests/ pytest suite
From the repository root:
python -m pip install -e .
streamlit run app/streamlit_app.pyThen open http://localhost:8501 and:
- Select a local Python repository or enter a public GitHub repository URL.
- Build the simple index first. It creates the GitHelp corpus and is the fastest way to validate extraction and retrieval.
- Ask a question and inspect the retrieved sources.
- Optionally build the MMORE index for native MMORE/Milvus retrieval.
The two modes serve different purposes:
- Simple: lightweight, deterministic lexical retrieval; recommended for a first run and for debugging.
- MMORE: dense and sparse retrieval through MMORE and Milvus; requires the additional indexing step and is more sensitive to model dependencies and local hardware.
From the repository root:
python -m pip install -e .For development:
python -m pytest -qGitHelp depends on MMORE for the native mmore backend:
mmore[index,rag]==1.2.2
transformers>=4.51.0,<5
The Transformers upper bound is intentional: MMORE sparse indexing currently uses APIs that are not compatible with Transformers 5.
streamlit run app/streamlit_app.pyThe interface lets a user:
- select a local target repository;
- clone a public GitHub repository into
data/repositories/; - build a simple corpus;
- export and build an MMORE-backed index;
- ask questions with or without the local LLM;
- inspect retrieved sources and debug metadata.
The default app configuration is:
configs/app_config.yaml
The current local app state is stored in:
data/app_state.json
This state file is generated locally and ignored by Git.
Clone and prepare a public GitHub repository with the simple backend:
python scripts/prepare_github_project.py \
https://github.com/swiss-ai/mmoreOr clone only:
python scripts/load_github_repository.py \
https://github.com/swiss-ai/mmoreGitHelp stores cloned repositories under:
data/repositories/
Build a project-specific corpus:
python scripts/build_corpus.py \
--config data/projects/mmore/project_config.yaml \
--output-path data/projects/mmore/corpus.jsonlPreview records:
python scripts/preview_corpus.py \
--corpus-path data/projects/mmore/corpus.jsonl \
--limit 3Debug retrieval:
python scripts/debug_retrieval.py \
"How do I configure indexing?" \
--corpus-path data/projects/mmore/corpus.jsonlPrepare an answer prompt:
python scripts/prepare_answer.py \
"How do I configure indexing?" \
--backend simple \
--corpus-path data/projects/mmore/corpus.jsonl \
--config-path configs/app_config.yamlGenerate an answer:
python scripts/answer_question.py \
"How do I configure indexing?" \
--backend mmore \
--llm \
--corpus-path data/projects/mmore/corpus.jsonl \
--config-path configs/app_config.yamlExport a GitHelp corpus to MMORE's JSONL format:
python scripts/export_mmore_corpus.py \
--corpus-path data/projects/mmore/corpus.jsonl \
--output-path data/projects/mmore/mmore_corpus.jsonlBuild the MMORE index:
python scripts/build_index.py \
--documents-path data/projects/mmore/mmore_corpus.jsonl \
--collection-name mmore_docsThe native MMORE retriever is run in an isolated subprocess. If that native
process fails in a local environment, GitHelp keeps Streamlit alive and falls
back to lexical retrieval over the exported mmore_corpus.jsonl. This fallback
keeps the application usable, but it does not use native MMORE/Milvus vector
search. Retrieved sources are tagged with the actual mode:
native_index
corpus_fallback
Run retrieval evaluation on benchmark questions:
python scripts/evaluate_retrieval.py \
--questions-path tests/evaluation/githelp_eval_questions.txt \
--corpus-path data/projects/mmore/corpus.jsonl \
--backend simple \
--top-k 5With expected-source checks:
python scripts/evaluate_retrieval.py \
--questions-path tests/evaluation/githelp_eval_questions.txt \
--expected-sources-path tests/evaluation/githelp_eval_expected_sources.example.json \
--corpus-path data/projects/mmore/corpus.jsonl \
--backend simple \
--top-k 5Run all tests:
python -m pytest -qCompile Python files:
python -m compileall -q src app scriptsThe test suite covers corpus building, loaders, extractors, retrieval, answering, LLM provider factory behavior, GitHub loading, project setup, and MMORE adapter edge cases.
Documentation lives in docs/ and is organized into:
- getting started guides;
- architecture and data flow;
- component-level documentation;
- development and debugging notes.
Build locally with Sphinx from the docs/ directory if needed:
python -m pip install -r docs/requirements.txt
sphinx-build -b html docs docs/_build/htmlGitHelp can also be run with Docker.
For local testing:
docker compose -f docker-compose.local.yml up --buildThen open:
http://localhost:8501
For the EPFL GPU server deployment, GitHelp is packaged as a CUDA-enabled Docker image and served through Traefik under:
http://gpu217.rcp.epfl.ch:1312/githelp/
The deployment uses a persistent data/ volume for cloned repositories, generated corpora and MMORE indexes, and a persistent Hugging Face cache for local model files.
Detailed deployment and troubleshooting instructions are available in:
docs/deployment/cluster.md

