torchao

Star

Here are 11 public repositories matching this topic...

sayakpaul / diffusers-torchao

Star

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).

flux torch text-to-image diffusion-models torch-compile torchao architecture-optimization

Updated Jan 8, 2026
Python

vipulSharma18 / Survey-of-Quantization-Formats

Star

A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.

quantization llama3 torchao gemlite

Updated Nov 11, 2025
Python

sayakpaul / diffusers-blackwell-quants

Star

Easy recipes to speed up latency of Flux, QwenImage, and LTX-2 with NVFP4 and MXFP8 on Blackwell.

pytorch image-gen diffusers video-gen torchao blackwell-gpu nvfp4 mxfp8

Updated Apr 10, 2026
Python

Dev-next-gen / flux-amd-rocm

Star

FLUX.1-dev on AMD Radeon consumer GPUs — fast, low-VRAM, and shippable. Backport patches + benchmarks for torchao + diffusers group_offload on ROCm.

flux amd pytorch text-to-image rocm int8 diffusers rdna3 torchao group-offload

Updated Apr 19, 2026
Python

PRITHIVSAKTHIUR / Flux.2-Klein-Small-Decoder-Only

Star

Flux.2-Klein-Small-Decoder-Only is an experimental, high-performance image generation and editing application built to exclusively utilize the FLUX.2-klein-4B model paired with the specialized FLUX.2-small-decoder Variational Autoencoder (VAE).

python transformers torch full-stack autoencoder vae gradio torchvision fastapi huggingface diffusion-models diffusers torchao flux-2-klein small-decoder

Updated May 18, 2026
Python

ParagEkbote / quantized-containerized-models

Star

Deploy AI models with an API through quantization and containerization.

flux ai cog pre-commit torch pytest quantization replicate peft huggingface diffusers bitsandbytes unsloth torchao smollm3 pruna

Updated Jan 17, 2026
Python

Senecan-antiballisticmissile3616 / flux-amd-rocm

Star

Run FLUX.1-dev on AMD Radeon GPUs using ROCm with backport patches, optimized scripts, and support for low-VRAM configurations.

react docker flux amd self-hosted pytorch music-generation text-to-image rocm int8 ai-music fastapi diffusers generative-ai ollama rdna3 torchao ace-step group-offload

Updated Jun 7, 2026
Python

jvoltci / breccia

Sponsor

Star

Block-scaled FP8 / FP4 / INT4 tensor primitive with Triton scaled-matmul at FP32 parity on H100. NumPy / PyTorch / MLX / JAX backends.

machine-learning deep-learning numpy pytorch triton quantization mlx jax low-precision int4 fp8 fp4 torchao nvfp4 transformer-engine mxfp8

Updated May 24, 2026
Python

debajyotidasgupta / IdentityFlow

Star

Identity-preserving image-to-video generation: vision-grounded prompt simplification via Qwen3-VL, Lightning LoRA 4-step inference, and SAM3-masked DINOv3 candidate reranking for fluid 720p video from a single reference image.

computer-vision sam pytorch video-generation image-to-video motion-synthesis diffusion-models text-to-video huggingface-transformers prompt-engineering generative-ai dinov2 video-diffusion identity-preservation torchao lora-fine-tuning wan2 cvpr2026

Updated Apr 6, 2026
Python

LaelaZorana / embodied-efficiency

Star

Measuring what makes a VLA fast enough to run on the robot: a 5.9x CUDA-graph win, four experiments on why low-bit doesn't, a budget-driven deploy-compiler, and a runtime safety supervisor. Live demo: hf.co/spaces/LaelaZ/embodied-efficiency

robotics cuda triton quantization vla embodied-ai torchao cuda-graphs

Updated Jun 7, 2026
Python

Wb-az / peft-qlora-text-classification

Star

This repository contains code for benchmarking ModernBERT, RoBERTa, and OPT-350m on multi-class emotion classification using 8-bit quantization, backbone freezing, and LoRA-based PEFT.

scikit-learn python3 pytorch scipy lora quantization emotion-analysis plotly-express huggingface-transformers int8-quantization supervised-finetuning llm-inference llm-fine-tuning roberta-base peft-fine-tuning-llm torchao modernbert facebook-opt cochrans-q-test

Updated Jun 1, 2026
Python

Improve this page

Add a description, image, and links to the torchao topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the torchao topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchao

Here are 11 public repositories matching this topic...

sayakpaul / diffusers-torchao

vipulSharma18 / Survey-of-Quantization-Formats

sayakpaul / diffusers-blackwell-quants

Dev-next-gen / flux-amd-rocm

PRITHIVSAKTHIUR / Flux.2-Klein-Small-Decoder-Only

ParagEkbote / quantized-containerized-models

Senecan-antiballisticmissile3616 / flux-amd-rocm

jvoltci / breccia

debajyotidasgupta / IdentityFlow

LaelaZorana / embodied-efficiency

Wb-az / peft-qlora-text-classification

Improve this page

Add this topic to your repo