|
About the distributed category
|
|
2
|
2857
|
November 28, 2025
|
|
Fully_shard with 2D mesh (4,1) still runs all-gather / reduce-scatter on the shard dimension
|
|
0
|
4
|
February 5, 2026
|
|
Is torch Muon optimizer compatible with FSDP/HSDP?
|
|
0
|
17
|
February 3, 2026
|
|
FSDP2 post backward hook registration
|
|
2
|
16
|
January 31, 2026
|
|
FSDP: Can users control which parameters are offloaded to CPU?
|
|
0
|
14
|
January 30, 2026
|
|
Difference between torch.cuda.synchronize() and dist.barrier()
|
|
3
|
4841
|
January 29, 2026
|
|
Runtime error raised in DDP when using .detach() to skip gradient computation in some DP ranks
|
|
2
|
36
|
January 28, 2026
|
|
FSDP2 vs DDP gradient mismatch on Embeddings (Flex Attention + Compile)
|
|
0
|
38
|
January 27, 2026
|
|
[Distributed w/ TorchTitan] Introducing Async Tensor Parallelism in PyTorch
|
|
12
|
17488
|
January 27, 2026
|
|
Multi GPU training on single node with DistributedDataParallel
|
|
3
|
5430
|
January 27, 2026
|
|
8xH100 training issue
|
|
4
|
117
|
January 20, 2026
|
|
DDP doesn't run unless TORCH_DISTRIBUTED_DEBUG=DETAIL is enabled
|
|
1
|
35
|
January 15, 2026
|
|
Can multiprocessing.Lock / Condition be used with torchrun?
|
|
1
|
27
|
January 11, 2026
|
|
P2P disbale not working
|
|
6
|
84
|
January 2, 2026
|
|
Node 0 cannot connect to itself
|
|
2
|
56
|
December 1, 2025
|
|
DDP: model not synchronizing across gpu's
|
|
8
|
5572
|
November 28, 2025
|
|
Help with DDP in kaggle notebook
|
|
2
|
301
|
November 26, 2025
|
|
Optimizer_state_dict with multiple optimizers in FSDP
|
|
1
|
122
|
November 20, 2025
|
|
Alternating Parameters in DDP
|
|
1
|
271
|
November 17, 2025
|
|
In a multi-GPU DDP environment, if the loss on one rank is NaN while the others are normal, could this cause the all-reduce to hang?
|
|
1
|
55
|
November 12, 2025
|
|
RPC cannot run in jetson orin because of the specific uuid of orin
|
|
3
|
96
|
November 11, 2025
|
|
Distributed Training causes model to output NaN values after resuming from snapshot
|
|
0
|
29
|
November 7, 2025
|
|
Pipeline Parallelism performance with distributed-rpc on Jetson Nano devices
|
|
3
|
1151
|
November 6, 2025
|
|
Problem: Pipeline Parallelism with distributed-rpc on Jetson Nano devices
|
|
1
|
225
|
October 28, 2025
|
|
FSDP2 and gradient w.r.t. inputs
|
|
2
|
103
|
October 28, 2025
|
|
Using Symmetric Memory One Shot All Reduce
|
|
1
|
736
|
October 27, 2025
|
|
Tensor parallelism in image models like Unet
|
|
4
|
521
|
October 27, 2025
|
|
Windows DDP on RTX 50-series only: use_libuv was requested but PyTorch was built without libuv support (works on 40/20-series)
|
|
0
|
393
|
October 25, 2025
|
|
CPU thread slow to enqueue GPU and communication kernels
|
|
2
|
98
|
October 20, 2025
|
|
Get `state_dict` from `DataDistributedParallel` model while other thread is running `backward`
|
|
0
|
31
|
October 19, 2025
|