Skip to content

pgaard/stemseparation

Repository files navigation

Stem Separator

A local web app for splitting any song into its individual stems (vocals, drums, bass, guitar, piano, and more) using AI, plus a click-track generator that detects tempo and lays down a metronome synced to your song.

Built on Demucs for source separation and librosa for beat tracking, with a Gradio UI.

Features

Stem separation

  • Splits audio into up to 6 stems: vocals, drums, bass, guitar, piano, other.
  • Multiple Demucs models:
    Model Notes
    htdemucs Default hybrid transformer — best all-round quality
    htdemucs_ft Fine-tuned — slower, higher quality
    htdemucs_6s 6 stems (adds guitar + piano)
    mdx_extra MDX-Net based
  • Quality presets that trade speed for fidelity:
    • Fast — 1 shift, 0.25 overlap
    • Better (4x slower) — forces htdemucs_ft
    • Best (20x slower)htdemucs_ft, 5 shifts, 0.5 overlap
  • Separation modes: all stems, or two-stem splits (vocals / drums / bass vs. the rest).
  • Output format: WAV (default) or MP3 with selectable bitrate (128–320 kbps).
  • GPU acceleration: automatically uses CUDA when available, otherwise falls back to CPU.

Click track generator

  • Detects tempo (BPM) and beat positions from your track.
  • Generates a click-only WAV and a preview mix of the click over the original audio.
  • Accented downbeats (assumes 4/4 time) with adjustable click volume.

Requirements

  • Python ≥ 3.11
  • uv for dependency management
  • FFmpeg available on your PATH (used for audio decoding/encoding)
  • An NVIDIA GPU with CUDA 12.4 is optional but strongly recommended — the project is configured to install the CUDA 12.4 build of PyTorch. On a CPU-only machine, separation still works but is much slower.

Setup

# Install dependencies into a local .venv
uv sync

This installs Demucs, Gradio, librosa, soundfile, and the CUDA 12.4 builds of torch / torchaudio (configured in pyproject.toml).

Running

uv run python app.py

On Windows you can use the helper script:

.\run.ps1

Gradio prints a local URL (typically http://127.0.0.1:7860) — open it in your browser. Upload a file, pick your model/quality/mode, and click Separate. Each resulting stem appears as a labeled, downloadable audio player.

How it works

  • app.py — the Gradio UI and the separate / generate_click_track logic.
  • run_demucs.py — a thin wrapper that runs Demucs in a subprocess. It monkey-patches torchaudio.load to use the soundfile backend, bypassing torchcodec, and is invoked via uv run.
  • Separation runs Demucs as a subprocess writing to a temp directory; output stems are renamed to {songname}_{stem}.{ext} and routed to the matching UI slots.

Project layout

app.py            # Gradio app + separation / click-track logic
run_demucs.py     # Demucs subprocess wrapper (soundfile backend)
run.ps1           # Windows launcher
pyproject.toml    # Dependencies + CUDA 12.4 PyTorch index
uv.lock           # Locked dependency versions

Notes

  • The first run downloads the selected Demucs model weights, which may take a moment.
  • Higher quality presets and the htdemucs_ft / htdemucs_6s models are significantly slower, especially on CPU.

About

Split any song into its individual stems and generate a click track

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors