FunASR is an industrial-grade speech recognition toolkit developed by Tongyi Lab, Alibaba Group. It provides a unified Python interface for the complete speech understanding pipeline — from raw audio to speaker-attributed, punctuated transcripts — in a single API call.
from funasr import AutoModel
model = AutoModel(
model="paraformer-zh", # speech recognition
vad_model="fsmn-vad", # voice activity detection
punc_model="ct-punc", # punctuation restoration
spk_model="cam++", # speaker diarization
)
res = model.generate(input="meeting.wav", batch_size_s=300)
for s in res[0]["sentence_info"]:
print(f"[Speaker {s['spk']}] {s['text']}")Output:
[Speaker 0] 欢迎大家来体验达摩院推出的语音识别模型。
[Speaker 1] 非常感谢,今天我们主要讨论三个议题。
[Speaker 0] 好的,请开始吧。
- Unified Pipeline — VAD, ASR, punctuation, speaker diarization, emotion detection composed in one
AutoModel()call - 50+ Languages — Fun-ASR-Nano (31 languages incl. Chinese dialects), Qwen3-ASR (52 languages with auto-detection)
- High Performance — Non-autoregressive inference; SenseVoice achieves 70ms RTF for 10s audio (15× faster than Whisper)
- Speaker Diarization — Per-sentence speaker labels, compatible with Paraformer, Fun-ASR-Nano, and SenseVoice
- Emotion & Audio Events — SenseVoice classifies emotions (happy/sad/angry/neutral) and detects BGM, laughter, applause
- Production Ready — Fine-tune with DeepSpeed, export to ONNX, deploy via Docker runtime or Python SDK
pip install -U funasr
# Or install from source for latest models
git clone https://github.com/modelscope/FunASR.git && pip install -e ./FunASRModels auto-download from ModelScope (fast in China). Add
hub="hf"for HuggingFace.
| Model | Type | Languages | Params | Inference Speed | Links |
|---|---|---|---|---|---|
| Fun-ASR-Nano | ASR | 31 | 800M | 中 | MS · HF |
| Paraformer-zh | ASR | 2 | 220M | 快 | MS · HF |
| SenseVoice | ASR+SER+AED | 5 | 234M | 极快 (70ms/10s) | MS · HF |
| Qwen3-ASR | ASR (LLM) | 52 | 1.7B | 慢 | MS · HF |
| GLM-ASR-Nano | ASR (LLM) | 17 | 1.5B | 慢 | MS · HF |
| Paraformer-Streaming | ASR | 1 | 220M | 实时 | MS · HF |
| fsmn-vad | VAD | 2 | 0.4M | — | MS · HF |
| ct-punc | Punctuation | 2 | 290M | — | MS · HF |
| cam++ | Speaker | — | 7.2M | — | MS · HF |
| emotion2vec+ | Emotion | — | 300M | — | MS · HF |
Full model list → model_zoo/ | Detailed usage → Documentation
| Date | Update |
|---|---|
| 2026.05 | Qwen3-ASR — 52 languages, LLM-based, auto language detection |
| 2026.05 | GLM-ASR-Nano — 17 languages, dialect & low-volume optimization |
| 2026.05 | Speaker diarization support added for Fun-ASR-Nano and SenseVoice |
| 2025.12 | Fun-ASR-Nano — 31-language end-to-end ASR, trained on tens of millions of hours |
| 2024.07 | SenseVoice — multi-task speech understanding (ASR + emotion + events) |
View full changelog
See CHANGELOG.md for complete release history.
| Resource | Description |
|---|---|
| Tutorial | Install, choose a model, run ASR/VAD/speaker diarization |
| Training Guide | Fine-tune Paraformer, SenseVoice, Fun-ASR-Nano on custom data |
| Developer Guide | Add a new model, understand the registry, test & contribute |
| API Reference | Auto-generated class & method docs with source links |
| Runtime / Deployment | File transcription service, real-time streaming service (CPU/GPU) |
| Project | Description |
|---|---|
| Fun-ASR-Nano | Multi-language ASR large model — 31 languages, timestamps, hotwords, speaker diarization |
| SenseVoice | Multi-task speech understanding — ASR, language ID, emotion, audio events |
| FunClip | AI video clipping powered by FunASR and LLM-assisted editing |
| CosyVoice | Natural speech generation with multi-language, timbre, and emotion control |
- Issues & feature requests → GitHub Issues
- Join our DingTalk discussion group:
@inproceedings{gao2023funasr,
author = {Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
title = {FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
booktitle = {INTERSPEECH},
year = {2023},
}Code: MIT License · Model weights: FunASR Model License (commercial use permitted with attribution)

