Tip
To be honest, UI design isn't my strong suit. If you're skilled in UI and interested in collaborating to improve this project, feel free to reach out!
A voice-controlled classroom assistant for teachers. Speak naturally to control slides, screens, volume, and applications — hands-free.
No keyboard or mouse required. Just say "next page", "black screen", or "open browser" to control classroom devices. Designed for Windows classroom environments, runs fully offline, and works smoothly on 4 GB RAM machines.
| Feature | Description |
|---|---|
| Voice Activity Detection | Silero VAD ONNX with configurable sensitivity, speech/silence duration, and a ring buffer that preserves ~1 s of audio history to avoid cutting off sentence starts |
| ASR Engine | SenseVoiceSmall (pure ONNX Runtime) running fully offline |
| Speaker Verification | CAM++ embedding via speakeronnx (192-dim L2-normalized vectors). Each teacher enrolls by reading 3-5 sentences; only their voice passes the similarity threshold |
| Command Matching | Three-tier strategy: exact, keyword (substring), or fuzzy (SequenceMatcher ratio). Falls back through the chain automatically, then to AI semantic matching (DeepSeek/OpenAI) when no keyword matches |
| AI Semantic Matching | Optional DeepSeek/OpenAI integration. Sends transcribed text to a configurable LLM to infer the intended command from natural language |
| Push-to-Talk & Voice Wake | Global hotkey Ctrl+Alt+V for push-to-talk; also supports pure voice wake via VAD |
| Multi-Teacher Profiles | Register, select, and delete teacher profiles at runtime through the settings dialog |
| Internationalization | English and Chinese UI; tray, settings, and pipeline logs all switch via configuration |
| Minimal GUI | System tray icon with context menu (Status, Settings, Mute, Quit); settings window with Voice Recognition / General / About tabs |
pip install -r requirements.txt
python main.pyRight-click the tray icon → Settings... → register a teacher. Start speaking: "Next Page", "Mute", "Open Baidu".
Microphone ─► AudioCapture ─► SileroVAD ─► SpeakerVerifier ─► ASR ─► CommandMatcher ─► Action
│
(SenseVoice)
│
(fallback) │
▼
AIMatcher (AI)
DeepSeek/OpenAI
- AudioCapture reads 16 kHz PCM blocks from the microphone
- SileroVAD runs an ONNX neural network on each block, accumulating speech segments
- SpeakerVerifier extracts a CAM++ embedding and compares it to the enrolled teacher's profile
- ASR (SenseVoice) transcribes the verified speech segment to text
- CommandMatcher finds the best-matching command (exact → keyword → fuzzy)
- AIMatcher (optional, configurable) falls back to an LLM (DeepSeek / OpenAI) when no keyword matches, parsing the response as JSON to determine the command
- Action executes the command — keyboard shortcut, system call, or UI action
All components are decoupled and wired together by VoicePipeline in voirol/core/pipeline.py.
| Category | Command | Action |
|---|---|---|
| Slide control | next_page, prev_page |
→ / ← |
| Display | black_screen, white_screen |
Monitor off / fullscreen white window |
| Application | open_whiteboard, open_browser, open_file |
mspaint, browser launch, file picker |
| Audio | volume_up, volume_down, mute |
System volume ±5, mute toggle |
| View | fullscreen, esc |
F11, Esc |
| Input | enter, space |
Enter, Space |
Chinese keyword lists accompany each command (e.g. 下一页 / 下一张 for next_page).
- Python 3.10+
- Windows 10/11
- 4 GB RAM minimum
Install
git clone <repo-url>
cd VoirolClass
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtConfigure
Edit config.toml to set language, microphone device, and ASR engine:
[general]
language = "en" # or "zh"
[asr]
engine = "sensevoice"The first run will automatically download the Silero VAD model (models/silero_vad.onnx) via mirror links.
Run
.venv\Scripts\python main.pyA tray icon appears in the taskbar. Right-click to open Settings, register a teacher, and start using voice commands.
Enrollment
- Right-click tray icon → Settings...
- Go to Voice Recognition tab → Register New Teacher
- Enter a name and read the 5 sentences aloud when prompted
- The system extracts your voiceprint and saves it
After enrollment, select your profile and start speaking. Only your voice will trigger commands.
Key settings in config.toml:
| Section | Key | Default | Description |
|---|---|---|---|
[general] |
language |
en |
UI language (en / zh) |
[vad] |
threshold |
0.25 |
Speech probability threshold |
min_speech_duration |
0.5 |
Seconds of speech to trigger | |
silence_duration |
1.0 |
Seconds of silence to end utterance | |
[voice] |
verification_threshold |
0.45 |
Similarity threshold for speaker match |
model_path |
campplus-zh-en |
speakeronnx model name | |
[asr] |
engine |
sensevoice |
sensevoice, baidu, azure, or tencent |
[commands] |
match_mode |
fuzzy |
exact / keyword / fuzzy |
fuzzy_threshold |
0.8 |
SequenceMatcher ratio | |
[hotkey] |
push_to_talk |
ctrl+alt+v |
PTT hotkey |
[ui] |
font_size |
13 |
Font size (px) |
border_radius |
5 |
Widget corner radius (px) | |
[ai] |
enabled |
false |
Enable AI fallback matching |
api_url |
https://api.deepseek.com/v1 |
OpenAI-compatible API endpoint | |
model |
deepseek-chat |
Model name | |
temperature |
0.1 |
LLM temperature (0.0–2.0) |
voirol/
├── ai/ # AI command matcher (DeepSeek/OpenAI)
├── asr/ # SenseVoice ASR engine
├── audio/ # Capture, VAD, preprocessing
├── command/ # Command registry, matcher, actions
├── core/ # Config & VoicePipeline
├── gui/ # System tray & settings dialog (PyQt6)
├── utils/ # i18n, logging, download helpers
└── voice/ # Speaker verification & enrollment
| Component | Library | Notes |
|---|---|---|
| GUI | PyQt6 | System tray + settings dialog |
| Audio capture | sounddevice | Callback-based PCM stream |
| VAD | Silero VAD ONNX | via onnxruntime |
| ASR | SenseVoiceSmall ONNX | Offline |
| Speaker verification | speakeronnx | CAM++ model, 192-dim embeddings |
| Command execution | pyautogui | Keyboard/mouse simulation |
| AI matching | DeepSeek / OpenAI API | Optional semantic fallback via LLM |
| Hotkeys | keyboard | Global hotkey registration |
| Internationalization | Custom dictionary | English & Chinese built-in |
Contributions are welcome, especially UI/UX design collaborators.
- Report bugs via GitHub Issues
- Submit pull requests for improvements
- Reach out for UI collaboration (see the tip at the top of this page)
This project is licensed under the MIT License — see the LICENSE file for details.
