Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch][torch.compile] Remove process group from quantizers
#3104 opened Jun 8, 2026 by pggPL Collaborator Loading…
3 of 12 tasks
Quantization support for GroupedTensor: FP8 per-tensor community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3102 opened Jun 7, 2026 by int-smart Contributor Loading…
11 of 13 tasks
Fix GroupedLinear FP8 calibration loop community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3101 opened Jun 6, 2026 by fallintoplace Loading…
Fix release wheel CUDA index calculation community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3100 opened Jun 6, 2026 by fallintoplace Loading…
Introduce Mega-C++ to reduce CPU overhead community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3099 opened Jun 6, 2026 by zhongbozhu Collaborator Draft
15 tasks
increased a bit tolerance for pytorch/distributed/run_numerics.py community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3095 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
NVFP4: cache GEMM-swizzled weight scale factors across micro-batches community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3093 opened Jun 5, 2026 by cael-ling Contributor Loading…
3 of 13 tasks
Added thd cudnn guard community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3092 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
guarding max_logits fused attention for cudnn < 9.21.0 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3091 opened Jun 5, 2026 by francesco-bertolotti Contributor Loading…
6 of 13 tasks
Make NVTE tensor handle pool size configurable community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3090 opened Jun 5, 2026 by lhb8125 Contributor Draft
fix(topk): fix UB and prevent vector load splitting in standalone_topk community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3088 opened Jun 5, 2026 by solos Loading…
5 of 13 tasks
[JAX] Extend tensor inspect utility to dump out tensors in identifiable names
#3086 opened Jun 4, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Fix norm workspace on global shapes
#3085 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
[JAX] MoEBlock tutorial
#3084 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
add attention docs
#3081 opened Jun 4, 2026 by sudhakarsingh27 Member Draft
13 tasks
[PyTorch] Add joint forward-backward op fusion pass enhancement New feature or request
#3080 opened Jun 4, 2026 by timmoon10 Member Loading…
8 of 13 tasks
[Common] Pack attention arguments as structs
#3079 opened Jun 3, 2026 by cyanguwa Collaborator Draft
13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3069 opened Jun 1, 2026 by cassiewilliam Contributor Loading…
6 of 8 tasks
Optimize NVFP4 4over6 candidate error path community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3068 opened Jun 1, 2026 by zianglih Contributor Loading…
9 of 13 tasks
[PyTorch] Propagate skip_fp8_weight_update in GroupedLinear during FP8 CUDA graph capture community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3065 opened May 31, 2026 by LeSingh1 Contributor Loading…
fix unfused padding causal sdpa community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3063 opened May 31, 2026 by hungryGeek16 Loading…
[JAX] Grouped quant+GEMM custom partitioning rules
#3058 opened May 28, 2026 by jberchtold-nvidia Collaborator Loading…
8 of 13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors. community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3057 opened May 28, 2026 by plugyawn Loading…
7 of 13 tasks
ProTip! Type g i on any issue or pull request to go back to the issue listing page.