-
Notifications
You must be signed in to change notification settings - Fork 494
Pull requests: pytorch/torchtitan
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
add train file specification option to run_start.sh
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
#1652
opened Aug 28, 2025 by
Shagun-G
Loading…
code refactor : making key steps modular train_step()
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
#1650
opened Aug 28, 2025 by
Shagun-G
Loading…
[moe][compile] Turn capture_scalar_outputs off by default
CLA Signed
This label is managed by the Meta Open Source bot.
#1649
opened Aug 28, 2025 by
xmfan
Loading…
[RFC] Support full bf16 training
CLA Signed
This label is managed by the Meta Open Source bot.
#1646
opened Aug 27, 2025 by
ebsmothers
Loading…
Activation Checkpoint improvment
CLA Signed
This label is managed by the Meta Open Source bot.
#1645
opened Aug 27, 2025 by
fegin
Loading…
[BE] Move NoParallel to torchtitan.distributed
CLA Signed
This label is managed by the Meta Open Source bot.
#1641
opened Aug 26, 2025 by
fegin
Loading…
[WIP][DSV3] GroupedExperts weights conversion optimization
CLA Signed
This label is managed by the Meta Open Source bot.
#1639
opened Aug 25, 2025 by
wwwjn
Loading…
[WIP] DCP: Dequantization and expert grouping for DSv3
CLA Signed
This label is managed by the Meta Open Source bot.
[DO NOT REVIEW] debug fsdp2 checkpoint for uneven sharding
CLA Signed
This label is managed by the Meta Open Source bot.
add option to use synthetic input data
CLA Signed
This label is managed by the Meta Open Source bot.
#1632
opened Aug 25, 2025 by
alfuyao1986
Loading…
[wip] Distributed Scion/Muon
CLA Signed
This label is managed by the Meta Open Source bot.
#1630
opened Aug 25, 2025 by
rakkit
Loading…
Enable multi rank safetensor consolidation
CLA Signed
This label is managed by the Meta Open Source bot.
#1625
opened Aug 22, 2025 by
ankitageorge
Loading…
allow expert_parallel wrapper to handel kwargs
CLA Signed
This label is managed by the Meta Open Source bot.
#1620
opened Aug 22, 2025 by
rakkit
Loading…
VLM: Onboarding native resolution, native aspect ratio, interleaved VLM training
CLA Signed
This label is managed by the Meta Open Source bot.
#1615
opened Aug 21, 2025 by
lkhphuc
Loading…
1 task done
Bump version to v0.1.1
CLA Signed
This label is managed by the Meta Open Source bot.
#1606
opened Aug 20, 2025 by
wwwjn
Loading…
workarounds for all2all autograd issues that Ruisi ran into
CLA Signed
This label is managed by the Meta Open Source bot.
#1604
opened Aug 20, 2025 by
bdhirsh
Loading…
Wrap sync + a2a in a custom op
CLA Signed
This label is managed by the Meta Open Source bot.
high priority
module: activation checkpointing
release blocking
Issues that are blocking the milestone / release completion
#1597
opened Aug 19, 2025 by
soulitzer
Loading…
[WIP] Activation Offloading with Separate Stream
CLA Signed
This label is managed by the Meta Open Source bot.
#1591
opened Aug 18, 2025 by
excelle08
Loading…
Update SAC config to force save instead of recompute
CLA Signed
This label is managed by the Meta Open Source bot.
[WIP][DSV3] Remove keep a copy of GroupedExperts weight, free memory in StateDictAdapter
CLA Signed
This label is managed by the Meta Open Source bot.
#1585
opened Aug 16, 2025 by
wwwjn
Loading…
Muon with 3D tensors
CLA Signed
This label is managed by the Meta Open Source bot.
#1584
opened Aug 16, 2025 by
byronxu99
Loading…
Add config to AC to toggle early-stop and revert A2A autograd.Function workaround
ci-no-td
CLA Signed
This label is managed by the Meta Open Source bot.
#1580
opened Aug 15, 2025 by
soulitzer
Loading…
[EP] add initial support for NVSHMEM-based all-to-all
CLA Signed
This label is managed by the Meta Open Source bot.
#1569
opened Aug 14, 2025 by
tianyu-l
Loading…
[Do Not Land] Debug for SDPA + CP nan issue in DeepSeekV3
CLA Signed
This label is managed by the Meta Open Source bot.
Multinode SkyPilot example
CLA Signed
This label is managed by the Meta Open Source bot.
#1564
opened Aug 13, 2025 by
alex000kim
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.