Skip to content

modelscope/FunASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4,899 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Fundamental End-to-End Speech Recognition Toolkit

PyPI GitHub Stars License: MIT Downloads

English | 简体中文 | Documentation | Paper


FunASR is an industrial-grade speech recognition toolkit developed by Tongyi Lab, Alibaba Group. It provides a unified Python interface for the complete speech understanding pipeline — from raw audio to speaker-attributed, punctuated transcripts — in a single API call.

from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",       # speech recognition
    vad_model="fsmn-vad",        # voice activity detection
    punc_model="ct-punc",        # punctuation restoration
    spk_model="cam++",           # speaker diarization
)
res = model.generate(input="meeting.wav", batch_size_s=300)

for s in res[0]["sentence_info"]:
    print(f"[Speaker {s['spk']}] {s['text']}")

Output:

[Speaker 0] 欢迎大家来体验达摩院推出的语音识别模型。
[Speaker 1] 非常感谢,今天我们主要讨论三个议题。
[Speaker 0] 好的,请开始吧。


Key Features

  • Unified Pipeline — VAD, ASR, punctuation, speaker diarization, emotion detection composed in one AutoModel() call
  • 50+ Languages — Fun-ASR-Nano (31 languages incl. Chinese dialects), Qwen3-ASR (52 languages with auto-detection)
  • High Performance — Non-autoregressive inference; SenseVoice achieves 70ms RTF for 10s audio (15× faster than Whisper)
  • Speaker Diarization — Per-sentence speaker labels, compatible with Paraformer, Fun-ASR-Nano, and SenseVoice
  • Emotion & Audio Events — SenseVoice classifies emotions (happy/sad/angry/neutral) and detects BGM, laughter, applause
  • Production Ready — Fine-tune with DeepSpeed, export to ONNX, deploy via Docker runtime or Python SDK

Installation

pip install -U funasr

# Or install from source for latest models
git clone https://github.com/modelscope/FunASR.git && pip install -e ./FunASR

Models auto-download from ModelScope (fast in China). Add hub="hf" for HuggingFace.

Model Zoo

Model Type Languages Params Inference Speed Links
Fun-ASR-Nano ASR 31 800M MS · HF
Paraformer-zh ASR 2 220M MS · HF
SenseVoice ASR+SER+AED 5 234M 极快 (70ms/10s) MS · HF
Qwen3-ASR ASR (LLM) 52 1.7B MS · HF
GLM-ASR-Nano ASR (LLM) 17 1.5B MS · HF
Paraformer-Streaming ASR 1 220M 实时 MS · HF
fsmn-vad VAD 2 0.4M MS · HF
ct-punc Punctuation 2 290M MS · HF
cam++ Speaker 7.2M MS · HF
emotion2vec+ Emotion 300M MS · HF

Full model list → model_zoo/  |  Detailed usage → Documentation

What's New

Date Update
2026.05 Qwen3-ASR — 52 languages, LLM-based, auto language detection
2026.05 GLM-ASR-Nano — 17 languages, dialect & low-volume optimization
2026.05 Speaker diarization support added for Fun-ASR-Nano and SenseVoice
2025.12 Fun-ASR-Nano — 31-language end-to-end ASR, trained on tens of millions of hours
2024.07 SenseVoice — multi-task speech understanding (ASR + emotion + events)
View full changelog

See CHANGELOG.md for complete release history.

Learn More

Resource Description
Tutorial Install, choose a model, run ASR/VAD/speaker diarization
Training Guide Fine-tune Paraformer, SenseVoice, Fun-ASR-Nano on custom data
Developer Guide Add a new model, understand the registry, test & contribute
API Reference Auto-generated class & method docs with source links
Runtime / Deployment File transcription service, real-time streaming service (CPU/GPU)

Ecosystem

Project Description
Fun-ASR-Nano Multi-language ASR large model — 31 languages, timestamps, hotwords, speaker diarization
SenseVoice Multi-task speech understanding — ASR, language ID, emotion, audio events
FunClip AI video clipping powered by FunASR and LLM-assisted editing
CosyVoice Natural speech generation with multi-language, timbre, and emotion control

Community

  • Issues & feature requests → GitHub Issues
  • Join our DingTalk discussion group:

Citation

@inproceedings{gao2023funasr,
  author    = {Zhifu Gao and Zerui Li and Jiaming Wang and Haoneng Luo and Xian Shi and Mengzhe Chen and Yabin Li and Lingyun Zuo and Zhihao Du and Zhangyu Xiao and Shiliang Zhang},
  title     = {FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  booktitle = {INTERSPEECH},
  year      = {2023},
}

License

Code: MIT License  ·  Model weights: FunASR Model License (commercial use permitted with attribution)

About

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors