Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ node_modules
dist
.git
.github
docs
tests
playwright-report
test-results
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ ENV NODE_ENV=production PORT=3000
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/index.html ./index.html
COPY --from=build /app/assets ./assets
COPY --from=build /app/package.json ./package.json
EXPOSE 3000
CMD ["node", "dist/server.js"]
100 changes: 46 additions & 54 deletions EXAMPLE.md
Original file line number Diff line number Diff line change
@@ -1,100 +1,92 @@
# Example: Running the Local LLM Benchmark
# Example: running Local-Bench

This example demonstrates how to use the Local-Bench tool.
A quick walkthrough of benchmarking local models and viewing the results.

## Prerequisites

1. Install Ollama: https://ollama.ai/
2. Download some models:
1. Install [Ollama](https://ollama.ai/) and start it: `ollama serve`
2. Pull a few models you can run on your hardware:

```bash
ollama pull llama2
ollama pull mistral
ollama pull phi
ollama pull gemma3:4b
ollama pull qwen3:8b
ollama pull llama3.1:8b
```

## Step 1: Run the Benchmark
## Step 1: Run the benchmark

```bash
# Benchmark default models
npm run benchmark
npm install
npm run build

# Benchmark specific installed models...
node dist/benchmark.js gemma3:4b qwen3:8b llama3.1:8b

# Or benchmark specific models
node benchmark.js llama2 mistral phi codellama
# ...or the whole curated catalog
npm run benchmark
```

Expected output:

```
=== Local LLM Benchmark Tool ===
Ollama API URL: http://localhost:11434

Models to benchmark: llama2, mistral, phi, codellama
Models to benchmark: gemma3:4b, qwen3:8b, llama3.1:8b
✓ Connected to Ollama API

Benchmarking llama2...
✓ Completed in 2.65s
✓ Generated 120 tokens
✓ Speed: 45.23 tokens/second
Benchmarking gemma3:4b...
✓ Completed in 5.25s
✓ Generated 412 tokens
✓ Speed: 78.42 tokens/second

Benchmarking mistral...
✓ Completed in 2.40s
✓ Generated 125 tokens
✓ Speed: 52.18 tokens/second
Benchmarking qwen3:8b...
✓ Completed in 9.52s
✓ Generated 498 tokens
✓ Speed: 52.31 tokens/second

Results saved to benchmark_results.csv

=== Benchmark Summary ===

Ranking (by tokens/second):
1. mistral: 52.18 tokens/s
2. llama2: 45.23 tokens/s
1. gemma3:4b: 78.42 tokens/s
2. qwen3:8b: 52.31 tokens/s
3. llama3.1:8b: 49.87 tokens/s

Done! Open index.html in a browser to view the results.
Done! Open the dashboard to view the results.
```

## Step 2: View Results in Web Interface
## Step 2: View results in the dashboard

```bash
# Start the web server
npm start

# Open in browser
# Navigate to http://localhost:3000
# open http://localhost:3000
```

You should see:
- Statistics cards showing total models, successful tests, average speed, and fastest model
- A bar chart comparing model performance
- A detailed table with all benchmark results
You'll see:

## Step 3: Re-run and Refresh
- Summary cards (catalog size, installed models, top intelligence, fastest measured)
- The **Model intelligence** catalog ranked by the Artificial Analysis Intelligence Index
- System specifications captured during the run
- A throughput bar chart and a detailed results table (with each model's `IQ` score)

After running new benchmarks:
1. Click the "Refresh Results" button in the web interface
2. The page will reload with updated data from the CSV file
## Step 3: Re-run and refresh

## Custom Configuration
Run more benchmarks (CLI or the **Run benchmark** button in the UI), then click **Refresh** in the dashboard to reload the latest data.

## Custom configuration

### Custom Ollama URL
```bash
# Point at a non-default Ollama
OLLAMA_API_URL=http://192.168.1.100:11434 npm run benchmark
```

### Custom Port for Web Server
```bash
# Custom dashboard port
PORT=8080 npm start
```

## Troubleshooting

### Error: Cannot connect to Ollama API
- Make sure Ollama is running: `ollama serve`
- Check the API endpoint: `curl http://localhost:11434/api/tags`

### Error: Model not found
- List available models: `ollama list`
- Pull the missing model: `ollama pull <model-name>`

### Benchmark times out
- The default timeout is 2 minutes
- Some larger models may take longer
- Consider testing with smaller prompts or fewer models
- **Cannot connect to Ollama API** — make sure `ollama serve` is running; check `curl http://localhost:11434/api/tags`.
- **Model not found** — `ollama list` to see what's installed, then `ollama pull <model-name>`.
- **Benchmark times out** — the per-model timeout is 2 minutes; try smaller models or fewer at once.
98 changes: 54 additions & 44 deletions LLM_TESTS.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,54 @@
| Name | Size | Context | Input |
| --- | --- | --- | --- |
| gemma3:270m | 292MB | 32K | Text |
| qwen3:0.6b | 523MB | 40K | Text |
| gemma3:1b | 815MB | 32K | Text |
| deepseek-r1:1.5b | 1.1GB | 128K | Text |
| llama3.2:1b | 1.3GB | 128K | Text |
| qwen3:1.7b | 1.4GB | 40K | Text |
| qwen3-vl:2b | 1.9GB | 256K | Text, Image |
| llama3.2:3b latest | 2.0GB | 128K | Text |
| qwen3:4b | 2.5GB | 256K | Text |
| gemma3:4b latest | 3.3GB | 128K | Text, Image |
| qwen3-vl:4b | 3.3GB | 256K | Text, Image |
| deepseek-r1:7b | 4.7GB | 128K | Text |
| llama3.1:8b latest | 4.9GB | 128K | Text |
| deepseek-r1:8b latest | 5.2GB | 128K | Text |
| qwen3:8b latest | 5.2GB | 40K | Text |
| qwen3-vl:8b latest | 6.1GB | 256K | Text, Image |
| gemma3:12b | 8.1GB | 128K | Text, Image |
| deepseek-r1:14b | 9.0GB | 128K | Text |
| qwen3:14b | 9.3GB | 40K | Text |
| gpt-oss:20b | 14GB | 128K | Text |
| gemma3:27b | 17GB | 128K | Text, Image |
| qwen3-coder:latest | 19GB | 256K | Text |
| qwen3-coder:30b latest | 19GB | 256K | Text |
| qwen3:30b | 19GB | 256K | Text |
| deepseek-r1:32b | 20GB | 128K | Text |
| qwen3:32b | 20GB | 40K | Text |
| qwen3-vl:30b | 20GB | 256K | Text, Image |
| qwen3-vl:32b | 21GB | 256K | |
| deepseek-r1:70b | 43GB | 128K | Text |
| llama3.1:70b | 43GB | 128K | Text |
| gpt-oss:120b | 65GB | 128K | Text |
| llama4:16x17b latest | 67GB | 10M | Text, Image |
| GLM-4.6:TQ1_0 | 84GB | 198K | Text |
| qwen3:235b | 142GB | 256K | Text |
| qwen3-vl:235b | 143GB | 256K | |
| GLM-4.6:Q4_K_M | 216GB | 198K | Text |
| llama3.1:405b | 243GB | 128K | Text |
| llama4:128x17b | 245GB | 1M | Text, Image |
| qwen3-coder:480b | 290GB | 256K | Text |
| deepseek-v3.1:671b latest | 404GB | 160K | Text |
| deepseek-r1:671b | 404GB | 160K | Text |
| minmax m2 | 968GB | 200K | Text |
# Curated Companion model catalog

The models Local-Bench benchmarks and ranks by default. The **Intelligence** column
is the [Artificial Analysis Intelligence Index](https://artificialanalysis.ai/)
(higher = more capable; snapshot `2026-06`). `—` means the model is not individually
rated by the index (vision-only or very small models).

This table is generated from `SUPPORTED_OLLAMA_MODELS` in [`src/benchmark.ts`](src/benchmark.ts) —
edit the `intelligenceIndex` values there to override the scores.

| Model | Size | Context | Inputs | Intelligence |
| --- | --- | --- | --- | --- |
| gemma3:270m | 292MB | 32K | Text | — |
| qwen3:0.6b | 523MB | 40K | Text | — |
| gemma3:1b | 815MB | 32K | Text | — |
| deepseek-r1:1.5b | 1.1GB | 128K | Text | — |
| llama3.2:1b | 1.3GB | 128K | Text | — |
| qwen3:1.7b | 1.4GB | 40K | Text | 3 |
| qwen3-vl:2b | 1.9GB | 256K | Text, Image | — |
| llama3.2:3b | 2.0GB | 128K | Text | 4 |
| qwen3:4b | 2.5GB | 256K | Text | 6 |
| gemma3:4b | 3.3GB | 128K | Text, Image | 4 |
| qwen3-vl:4b | 3.3GB | 256K | Text, Image | — |
| deepseek-r1:7b | 4.7GB | 128K | Text | 8 |
| llama3.1:8b | 4.9GB | 128K | Text | 8 |
| deepseek-r1:8b | 5.2GB | 128K | Text | 9 |
| qwen3:8b | 5.2GB | 40K | Text | 9 |
| qwen3-vl:8b | 6.1GB | 256K | Text, Image | — |
| gemma3:12b | 8.1GB | 128K | Text, Image | 7 |
| deepseek-r1:14b | 9.0GB | 128K | Text | 13 |
| qwen3:14b | 9.3GB | 40K | Text | 11 |
| gpt-oss:20b | 14GB | 128K | Text | 24 |
| gemma3:27b | 17GB | 128K | Text, Image | 10 |
| qwen3-coder:latest | 19GB | 256K | Text | 20 |
| qwen3-coder:30b | 19GB | 256K | Text | 20 |
| qwen3:30b | 19GB | 256K | Text | 15 |
| deepseek-r1:32b | 20GB | 128K | Text | 18 |
| qwen3:32b | 20GB | 40K | Text | 15 |
| qwen3-vl:30b | 20GB | 256K | Text, Image | — |
| qwen3-vl:32b | 21GB | 256K | Text, Image | — |
| deepseek-r1:70b | 43GB | 128K | Text | 20 |
| llama3.1:70b | 43GB | 128K | Text | 16 |
| gpt-oss:120b | 65GB | 128K | Text | 33 |
| llama4:16x17b | 67GB | 10M | Text, Image | 13 |
| GLM-4.6:TQ1_0 | 84GB | 198K | Text | 30 |
| qwen3:235b | 142GB | 256K | Text | 45 |
| qwen3-vl:235b | 143GB | 256K | Text, Image | — |
| GLM-4.6:Q4_K_M | 216GB | 198K | Text | 30 |
| llama3.1:405b | 243GB | 128K | Text | 17 |
| llama4:128x17b | 245GB | 1M | Text, Image | 18 |
| qwen3-coder:480b | 290GB | 256K | Text | 24 |
| deepseek-v3.1:671b | 404GB | 160K | Text | 28 |
| deepseek-r1:671b | 404GB | 160K | Text | 27 |
| minmax m2 | 968GB | 200K | Text | 44 |
Loading
Loading