GitHub - modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

A Fundamental End-to-End Speech Recognition Toolkit

FunASR is an industrial-grade speech recognition toolkit developed by Tongyi Lab, Alibaba Group. It provides a unified Python interface for the complete speech understanding pipeline — from raw audio to speaker-attributed, punctuated transcripts — in a single API call.

from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",       # speech recognition
    vad_model="fsmn-vad",        # voice activity detection
    punc_model="ct-punc",        # punctuation restoration
    spk_model="cam++",           # speaker diarization
)
res = model.generate(input="meeting.wav", batch_size_s=300)

for s in res[0]["sentence_info"]:
    print(f"[Speaker {s['spk']}] {s['text']}")

Output:

[Speaker 0] 欢迎大家来体验达摩院推出的语音识别模型。
[Speaker 1] 非常感谢，今天我们主要讨论三个议题。
[Speaker 0] 好的，请开始吧。

Key Features

Unified Pipeline — VAD, ASR, punctuation, speaker diarization, emotion detection composed in one AutoModel() call
50+ Languages — Fun-ASR-Nano (31 languages incl. Chinese dialects), Qwen3-ASR (52 languages with auto-detection)
High Performance — Non-autoregressive inference; SenseVoice achieves 70ms RTF for 10s audio (15× faster than Whisper)
Speaker Diarization — Per-sentence speaker labels, compatible with Paraformer, Fun-ASR-Nano, and SenseVoice
Emotion & Audio Events — SenseVoice classifies emotions (happy/sad/angry/neutral) and detects BGM, laughter, applause
Production Ready — Fine-tune with DeepSpeed, export to ONNX, deploy via Docker runtime or Python SDK

Installation

pip install -U funasr

# Or install from source for latest models
git clone https://github.com/modelscope/FunASR.git && pip install -e ./FunASR

Models auto-download from ModelScope (fast in China). Add hub="hf" for HuggingFace.

Model Zoo

Model	Type	Languages	Params	Inference Speed	Links
Fun-ASR-Nano	ASR	31	800M	中	MS · HF
Paraformer-zh	ASR	2	220M	快	MS · HF
SenseVoice	ASR+SER+AED	5	234M	极快 (70ms/10s)	MS · HF
Qwen3-ASR	ASR (LLM)	52	1.7B	慢	MS · HF
GLM-ASR-Nano	ASR (LLM)	17	1.5B	慢	MS · HF
Paraformer-Streaming	ASR	1	220M	实时	MS · HF
fsmn-vad	VAD	2	0.4M	—	MS · HF
ct-punc	Punctuation	2	290M	—	MS · HF
cam++	Speaker	—	7.2M	—	MS · HF
emotion2vec+	Emotion	—	300M	—	MS · HF

Full model list → model_zoo/ | Detailed usage → Documentation

What's New

Date	Update
2026.05	Qwen3-ASR — 52 languages, LLM-based, auto language detection
2026.05	GLM-ASR-Nano — 17 languages, dialect & low-volume optimization
2026.05	Speaker diarization support added for Fun-ASR-Nano and SenseVoice
2025.12	Fun-ASR-Nano — 31-language end-to-end ASR, trained on tens of millions of hours
2024.07	SenseVoice — multi-task speech understanding (ASR + emotion + events)

View full changelog

See CHANGELOG.md for complete release history.

Learn More

Resource	Description
Tutorial	Install, choose a model, run ASR/VAD/speaker diarization
Training Guide	Fine-tune Paraformer, SenseVoice, Fun-ASR-Nano on custom data
Developer Guide	Add a new model, understand the registry, test & contribute
API Reference	Auto-generated class & method docs with source links
Runtime / Deployment	File transcription service, real-time streaming service (CPU/GPU)

Ecosystem

Project	Description
Fun-ASR-Nano	Multi-language ASR large model — 31 languages, timestamps, hotwords, speaker diarization
SenseVoice	Multi-task speech understanding — ASR, language ID, emotion, audio events
FunClip	AI video clipping powered by FunASR and LLM-assisted editing
CosyVoice	Natural speech generation with multi-language, timbre, and emotion control

Community

Issues & feature requests → GitHub Issues
Join our DingTalk discussion group:

Citation

@inproceedings{gao2023funasr,
  author    = {Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
  title     = {FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle = {INTERSPEECH},
  year      = {2023},
}

License

Code: MIT License · Model weights: FunASR Model License (commercial use permitted with attribution)

Name		Name	Last commit message	Last commit date
Latest commit History 4,899 Commits
.github		.github
benchmarks		benchmarks
data/list		data/list
docs		docs
examples		examples
fun_text_processing		fun_text_processing
funasr		funasr
gh-pages-output		gh-pages-output
model_zoo		model_zoo
runtime		runtime
scripts		scripts
tests		tests
tests_models		tests_models
web-pages		web-pages
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Acknowledge.md		Acknowledge.md
CHANGELOG.md		CHANGELOG.md
Contribution.md		Contribution.md
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
MinMo_gitlab		MinMo_gitlab
README.md		README.md
README_zh.md		README_zh.md
SECURITY.md		SECURITY.md
setup.py		setup.py
training.html		training.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Key Features

Installation

Model Zoo

What's New

Learn More

Ecosystem

Community

Citation

License

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Key Features

Installation

Model Zoo

What's New

Learn More

Ecosystem

Community

Citation

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages