A FastAPI application that provides a chatbot interface powered by Ollama LLM with comprehensive monitoring using Prometheus metrics.
- AI-Powered Chat Interface: Uses Ollama LLM (Llama 3.2 1B model) to respond to user questions
- Conversation Context: Maintains conversation history for contextual responses
- Redis Caching: Caches responses for improved performance
- Prometheus Metrics: Comprehensive monitoring of:
- Request counts and latency
- CPU and memory usage
- Error tracking
- Active request monitoring
- Logging: Detailed logging for debugging and performance tracking
- Python 3.9+
- Redis server
- Ollama with Llama 3.2 1B model installed
- Docker (optional, for containerization)
This project uses Poetry for dependency management:
-
Clone this repository:
git clone https://github.com/sattensil/FastAPIChatbotWithPrometheus.git cd FastAPIChatbotWithPrometheus -
Install dependencies with Poetry:
# If you don't have Poetry installed curl -sSL https://install.python-poetry.org | python3 - # Install dependencies poetry install # Activate the virtual environment poetry env use python poetry env activate
-
Clone this repository:
git clone https://github.com/sattensil/FastAPIChatbotWithPrometheus.git cd FastAPIChatbotWithPrometheus -
Install dependencies:
pip install -r requirements.txt -
Make sure Redis is running:
redis-server -
Make sure Ollama is installed and the Llama 3.2 1B model is available:
ollama pull llama3.2:1b -
Start the application:
# If using Poetry poetry run uvicorn main:app --reload # If using pip uvicorn main:app --reload
-
Build the Docker image:
docker build -t my-fastapi-app . -
Run the container:
docker run -p 8000:8000 my-fastapi-app
This setup includes FastAPI, Redis, Prometheus, and Grafana all configured and ready to use:
-
Start all services:
docker-compose up -
Access the services:
- Chat Interface: http://localhost:8000
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (username: admin, password: admin)
- API Documentation: http://localhost:8000/docs
-
POST /chat: Send a question to the chatbot
{ "question": "Your question here" } -
GET /metrics: Prometheus metrics endpoint for monitoring
The application exposes various Prometheus metrics at the /metrics endpoint:
apiserver_request_total: Total number of requestsapiserver_request_latency_seconds: Latency of requests in secondsapiserver_request_errors_total: Total number of errorsapiserver_active_requests: Number of active requestsapiserver_request_duration_seconds_bucket: Duration of API server requestsnode_cpu_usage_percent: Total CPU usage percentagenode_memory_total_bytes: Total memory in bytesnode_memory_free_bytes: Free memory in bytes- And many more system metrics
This project includes a complete monitoring stack setup with Prometheus and Grafana:
-
Install Prometheus (if not already installed):
brew install prometheus
-
Use the provided configuration file:
prometheus --config.file=prometheus_local.yml
-
Access Prometheus at: http://localhost:9090
-
Install Grafana (if not already installed):
brew install grafana
-
Start Grafana:
brew services start grafana
-
Access Grafana at: http://localhost:3000 (default credentials: admin/admin)
-
Add Prometheus as a data source:
- URL: http://localhost:9090
- Name: prometheus
-
Import the provided dashboard:
- Go to Dashboards > Import
- Upload the
fastapi_dashboard.jsonfile
This setup provides comprehensive visualization of all metrics collected by your FastAPI application.
-
Redis Connection Error:
- Ensure Redis is running on localhost:6379
- Check for error messages in the logs
-
Ollama Model Not Found:
- Verify the model is installed with
ollama list - Pull the model if needed:
ollama pull llama3.2:1b
- Verify the model is installed with
-
Port Already in Use:
- Change the port with
uvicorn main:app --reload --port 8001
- Change the port with
-
Import Errors:
- Ensure all dependencies are installed correctly
- Try reinstalling with
pip install -r requirements.txt
The chatbot currently has a fun circus expert personality! You can customize its personality and behavior by modifying the prompt templates in main.py:
- Open
main.pyand locate the template definitions (around line 144) - Modify the instructions in both
templateandtemplate_newvariables - Update the welcome message in
static/index.htmlto match your new theme
The chatbot is currently configured as "The Amazing Circusbot" with:
- Expertise in circus arts, performances, history, and culture
- A vibrant, playful personality with circus flair
- Circus-themed language and expressions
- Occasional circus facts and historical information
You can replace the current template with any of these examples or create your own:
template = """
You are Cosmo, an interstellar explorer with knowledge of the cosmos, space travel, and alien worlds.
Use astronomy terminology and space metaphors in your responses.
...
"""template = """
You are Chef Byte, a master culinary expert with knowledge of global cuisines and cooking techniques.
Sprinkle in cooking terminology and food puns in your responses.
...
"""main.py: Main FastAPI application with routes and middlewarerequirements.txt: Project dependenciesDockerfile: Docker configuration for containerizationsetup_model.ipynb: Notebook for model setup and testingstatic/index.html: Web interface for the chatbot
The application includes a user-friendly web interface for chatting with the bot:
- Access the web interface at: http://localhost:8000
- Type your question in the input field
- Press Enter or click the Send button
- View the bot's response in the chat window
The web interface also includes convenient links to all monitoring dashboards.
curl -X POST "http://127.0.0.1:8000/chat" \
-H "Content-Type: application/json" \
-d '{"question":"What can you tell me about FastAPI?"}'Open your browser and navigate to:
http://127.0.0.1:8000/metrics
Or use curl:
curl http://127.0.0.1:8000/metricsThis project includes a complete monitoring stack with Prometheus and Grafana:
Prometheus collects metrics from your FastAPI application:
- Access the Prometheus UI at: http://localhost:9090
- Query metrics using PromQL
- View targets and their health
Grafana provides visualizations for your metrics:
- Access Grafana at: http://localhost:3000 (default credentials: admin/admin)
- Pre-configured dashboard for FastAPI metrics
- Panels for:
- Request counts and latency
- CPU and memory usage
- Error tracking
- Active request monitoring
The pre-configured Grafana dashboard includes:
-
Request Metrics:
- Total request count
- Request latency (95th and 50th percentiles)
- Active requests gauge
-
System Metrics:
- CPU usage percentage
- Memory usage over time
- Memory allocation breakdown
-
Error Tracking:
- Error count over time
- API probe duration
All metrics are updated in real-time with a 5-second refresh interval.