Skip to content

ChidcGithub/VoirolClass

Repository files navigation

VoirolClass

VoirolClass

Python 3.10+ License: MIT Platform: Windows Version Build

中文文档

Tip

To be honest, UI design isn't my strong suit. If you're skilled in UI and interested in collaborating to improve this project, feel free to reach out!


A voice-controlled classroom assistant for teachers. Speak naturally to control slides, screens, volume, and applications — hands-free.

No keyboard or mouse required. Just say "next page", "black screen", or "open browser" to control classroom devices. Designed for Windows classroom environments, runs fully offline, and works smoothly on 4 GB RAM machines.


Features

Feature Description
Voice Activity Detection Silero VAD ONNX with configurable sensitivity, speech/silence duration, and a ring buffer that preserves ~1 s of audio history to avoid cutting off sentence starts
ASR Engine SenseVoiceSmall (pure ONNX Runtime) running fully offline
Speaker Verification CAM++ embedding via speakeronnx (192-dim L2-normalized vectors). Each teacher enrolls by reading 3-5 sentences; only their voice passes the similarity threshold
Command Matching Three-tier strategy: exact, keyword (substring), or fuzzy (SequenceMatcher ratio). Falls back through the chain automatically, then to AI semantic matching (DeepSeek/OpenAI) when no keyword matches
AI Semantic Matching Optional DeepSeek/OpenAI integration. Sends transcribed text to a configurable LLM to infer the intended command from natural language
Push-to-Talk & Voice Wake Global hotkey Ctrl+Alt+V for push-to-talk; also supports pure voice wake via VAD
Multi-Teacher Profiles Register, select, and delete teacher profiles at runtime through the settings dialog
Internationalization English and Chinese UI; tray, settings, and pipeline logs all switch via configuration
Minimal GUI System tray icon with context menu (Status, Settings, Mute, Quit); settings window with Voice Recognition / General / About tabs

Quick Start

pip install -r requirements.txt
python main.py

Right-click the tray icon → Settings... → register a teacher. Start speaking: "Next Page", "Mute", "Open Baidu".


Architecture

Microphone ─► AudioCapture ─► SileroVAD ─► SpeakerVerifier ─► ASR ─► CommandMatcher ─► Action
                                                                          │
                                                                   (SenseVoice)
                                                                          │
                                                               (fallback)  │
                                                                          ▼
                                                                    AIMatcher (AI)
                                                                   DeepSeek/OpenAI
  1. AudioCapture reads 16 kHz PCM blocks from the microphone
  2. SileroVAD runs an ONNX neural network on each block, accumulating speech segments
  3. SpeakerVerifier extracts a CAM++ embedding and compares it to the enrolled teacher's profile
  4. ASR (SenseVoice) transcribes the verified speech segment to text
  5. CommandMatcher finds the best-matching command (exact → keyword → fuzzy)
  6. AIMatcher (optional, configurable) falls back to an LLM (DeepSeek / OpenAI) when no keyword matches, parsing the response as JSON to determine the command
  7. Action executes the command — keyboard shortcut, system call, or UI action

All components are decoupled and wired together by VoicePipeline in voirol/core/pipeline.py.


Supported Commands

Category Command Action
Slide control next_page, prev_page /
Display black_screen, white_screen Monitor off / fullscreen white window
Application open_whiteboard, open_browser, open_file mspaint, browser launch, file picker
Audio volume_up, volume_down, mute System volume ±5, mute toggle
View fullscreen, esc F11, Esc
Input enter, space Enter, Space

Chinese keyword lists accompany each command (e.g. 下一页 / 下一张 for next_page).


Getting Started

Prerequisites

  • Python 3.10+
  • Windows 10/11
  • 4 GB RAM minimum
Install
git clone <repo-url>
cd VoirolClass
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Configure

Edit config.toml to set language, microphone device, and ASR engine:

[general]
language = "en"         # or "zh"

[asr]
engine = "sensevoice"

The first run will automatically download the Silero VAD model (models/silero_vad.onnx) via mirror links.

Run
.venv\Scripts\python main.py

A tray icon appears in the taskbar. Right-click to open Settings, register a teacher, and start using voice commands.

Enrollment
  1. Right-click tray icon → Settings...
  2. Go to Voice Recognition tab → Register New Teacher
  3. Enter a name and read the 5 sentences aloud when prompted
  4. The system extracts your voiceprint and saves it

After enrollment, select your profile and start speaking. Only your voice will trigger commands.


Configuration

Key settings in config.toml:

Section Key Default Description
[general] language en UI language (en / zh)
[vad] threshold 0.25 Speech probability threshold
min_speech_duration 0.5 Seconds of speech to trigger
silence_duration 1.0 Seconds of silence to end utterance
[voice] verification_threshold 0.45 Similarity threshold for speaker match
model_path campplus-zh-en speakeronnx model name
[asr] engine sensevoice sensevoice, baidu, azure, or tencent
[commands] match_mode fuzzy exact / keyword / fuzzy
fuzzy_threshold 0.8 SequenceMatcher ratio
[hotkey] push_to_talk ctrl+alt+v PTT hotkey
[ui] font_size 13 Font size (px)
border_radius 5 Widget corner radius (px)
[ai] enabled false Enable AI fallback matching
api_url https://api.deepseek.com/v1 OpenAI-compatible API endpoint
model deepseek-chat Model name
temperature 0.1 LLM temperature (0.0–2.0)

Project Structure

voirol/
├── ai/                   # AI command matcher (DeepSeek/OpenAI)
├── asr/                  # SenseVoice ASR engine
├── audio/                # Capture, VAD, preprocessing
├── command/              # Command registry, matcher, actions
├── core/                 # Config & VoicePipeline
├── gui/                  # System tray & settings dialog (PyQt6)
├── utils/                # i18n, logging, download helpers
└── voice/                # Speaker verification & enrollment

Tech Stack

Component Library Notes
GUI PyQt6 System tray + settings dialog
Audio capture sounddevice Callback-based PCM stream
VAD Silero VAD ONNX via onnxruntime
ASR SenseVoiceSmall ONNX Offline
Speaker verification speakeronnx CAM++ model, 192-dim embeddings
Command execution pyautogui Keyboard/mouse simulation
AI matching DeepSeek / OpenAI API Optional semantic fallback via LLM
Hotkeys keyboard Global hotkey registration
Internationalization Custom dictionary English & Chinese built-in

Contributing

Contributions are welcome, especially UI/UX design collaborators.

  • Report bugs via GitHub Issues
  • Submit pull requests for improvements
  • Reach out for UI collaboration (see the tip at the top of this page)

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Voice-controlled classroom assistant for teachers. Offline ASR (SenseVoice / Vosk) + speaker verification. Pure voice wake or push-to-talk. Windows, Python, ONNX Runtime.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors