VoirolClass

中文文档

Tip

To be honest, UI design isn't my strong suit. If you're skilled in UI and interested in collaborating to improve this project, feel free to reach out!

A voice-controlled classroom assistant for teachers. Speak naturally to control slides, screens, volume, and applications — hands-free.

No keyboard or mouse required. Just say "next page", "black screen", or "open browser" to control classroom devices. Designed for Windows classroom environments, runs fully offline, and works smoothly on 4 GB RAM machines.

Features

Feature	Description
Voice Activity Detection	Silero VAD ONNX with configurable sensitivity, speech/silence duration, and a ring buffer that preserves ~1 s of audio history to avoid cutting off sentence starts
ASR Engine	SenseVoiceSmall (pure ONNX Runtime) running fully offline
Speaker Verification	CAM++ embedding via `speakeronnx` (192-dim L2-normalized vectors). Each teacher enrolls by reading 3-5 sentences; only their voice passes the similarity threshold
Command Matching	Three-tier strategy: exact, keyword (substring), or fuzzy (SequenceMatcher ratio). Falls back through the chain automatically, then to AI semantic matching (DeepSeek/OpenAI) when no keyword matches
AI Semantic Matching	Optional DeepSeek/OpenAI integration. Sends transcribed text to a configurable LLM to infer the intended command from natural language
Push-to-Talk & Voice Wake	Global hotkey `Ctrl`+`Alt`+`V` for push-to-talk; also supports pure voice wake via VAD
Multi-Teacher Profiles	Register, select, and delete teacher profiles at runtime through the settings dialog
Internationalization	English and Chinese UI; tray, settings, and pipeline logs all switch via configuration
Minimal GUI	System tray icon with context menu (Status, Settings, Mute, Quit); settings window with Voice Recognition / General / About tabs

Quick Start

pip install -r requirements.txt
python main.py

Right-click the tray icon → Settings... → register a teacher. Start speaking: "Next Page", "Mute", "Open Baidu".

Architecture

Microphone ─► AudioCapture ─► SileroVAD ─► SpeakerVerifier ─► ASR ─► CommandMatcher ─► Action
                                                                          │
                                                                   (SenseVoice)
                                                                          │
                                                               (fallback)  │
                                                                          ▼
                                                                    AIMatcher (AI)
                                                                   DeepSeek/OpenAI

AudioCapture reads 16 kHz PCM blocks from the microphone
SileroVAD runs an ONNX neural network on each block, accumulating speech segments
SpeakerVerifier extracts a CAM++ embedding and compares it to the enrolled teacher's profile
ASR (SenseVoice) transcribes the verified speech segment to text
CommandMatcher finds the best-matching command (exact → keyword → fuzzy)
AIMatcher (optional, configurable) falls back to an LLM (DeepSeek / OpenAI) when no keyword matches, parsing the response as JSON to determine the command
Action executes the command — keyboard shortcut, system call, or UI action

All components are decoupled and wired together by VoicePipeline in voirol/core/pipeline.py.

Supported Commands

Category	Command	Action
Slide control	`next_page`, `prev_page`	`→` / `←`
Display	`black_screen`, `white_screen`	Monitor off / fullscreen white window
Application	`open_whiteboard`, `open_browser`, `open_file`	mspaint, browser launch, file picker
Audio	`volume_up`, `volume_down`, `mute`	System volume ±5, mute toggle
View	`fullscreen`, `esc`	`F11`, `Esc`
Input	`enter`, `space`	`Enter`, `Space`

Chinese keyword lists accompany each command (e.g. 下一页 / 下一张 for next_page).

Getting Started

Prerequisites

Python 3.10+
Windows 10/11
4 GB RAM minimum

Install

git clone <repo-url>
cd VoirolClass
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Configure

Edit config.toml to set language, microphone device, and ASR engine:

[general]
language = "en"         # or "zh"

[asr]
engine = "sensevoice"

The first run will automatically download the Silero VAD model (models/silero_vad.onnx) via mirror links.

Run

.venv\Scripts\python main.py

A tray icon appears in the taskbar. Right-click to open Settings, register a teacher, and start using voice commands.

Enrollment

Right-click tray icon → Settings...
Go to Voice Recognition tab → Register New Teacher
Enter a name and read the 5 sentences aloud when prompted
The system extracts your voiceprint and saves it

After enrollment, select your profile and start speaking. Only your voice will trigger commands.

Configuration

Key settings in config.toml:

Section	Key	Default	Description
`[general]`	`language`	`en`	UI language (`en` / `zh`)
`[vad]`	`threshold`	`0.25`	Speech probability threshold
	`min_speech_duration`	`0.5`	Seconds of speech to trigger
	`silence_duration`	`1.0`	Seconds of silence to end utterance
`[voice]`	`verification_threshold`	`0.45`	Similarity threshold for speaker match
	`model_path`	`campplus-zh-en`	speakeronnx model name
`[asr]`	`engine`	`sensevoice`	`sensevoice`, `baidu`, `azure`, or `tencent`
`[commands]`	`match_mode`	`fuzzy`	`exact` / `keyword` / `fuzzy`
	`fuzzy_threshold`	`0.8`	SequenceMatcher ratio
`[hotkey]`	`push_to_talk`	`ctrl+alt+v`	PTT hotkey
`[ui]`	`font_size`	`13`	Font size (px)
	`border_radius`	`5`	Widget corner radius (px)
`[ai]`	`enabled`	`false`	Enable AI fallback matching
	`api_url`	`https://api.deepseek.com/v1`	OpenAI-compatible API endpoint
	`model`	`deepseek-chat`	Model name
	`temperature`	`0.1`	LLM temperature (0.0–2.0)

Project Structure

voirol/
├── ai/                   # AI command matcher (DeepSeek/OpenAI)
├── asr/                  # SenseVoice ASR engine
├── audio/                # Capture, VAD, preprocessing
├── command/              # Command registry, matcher, actions
├── core/                 # Config & VoicePipeline
├── gui/                  # System tray & settings dialog (PyQt6)
├── utils/                # i18n, logging, download helpers
└── voice/                # Speaker verification & enrollment

Tech Stack

Component	Library	Notes
GUI	PyQt6	System tray + settings dialog
Audio capture	sounddevice	Callback-based PCM stream
VAD	Silero VAD ONNX	via onnxruntime
ASR	SenseVoiceSmall ONNX	Offline
Speaker verification	speakeronnx	CAM++ model, 192-dim embeddings
Command execution	pyautogui	Keyboard/mouse simulation
AI matching	DeepSeek / OpenAI API	Optional semantic fallback via LLM
Hotkeys	keyboard	Global hotkey registration
Internationalization	Custom dictionary	English & Chinese built-in

Contributing

Contributions are welcome, especially UI/UX design collaborators.

Report bugs via GitHub Issues
Submit pull requests for improvements
Reach out for UI collaboration (see the tip at the top of this page)

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
assets/img		assets/img
fonts		fonts
models		models
voirol		voirol
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
VoirolClass.png		VoirolClass.png
config.toml.example		config.toml.example
installer.nsi		installer.nsi
main.py		main.py
plan.md		plan.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoirolClass

Features

Quick Start

Architecture

Supported Commands

Getting Started

Prerequisites

Configuration

Project Structure

Tech Stack

Contributing

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoirolClass

Features

Quick Start

Architecture

Supported Commands

Getting Started

Prerequisites

Configuration

Project Structure

Tech Stack

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages