| About the distributed category |     | 1 | 2819 | January 20, 2021 | 
        
          | Windows DDP on RTX 50-series only: use_libuv was requested but PyTorch was built without libuv support (works on 40/20-series) |   | 0 | 12 | October 25, 2025 | 
        
          | CPU thread slow to enqueue GPU and communication kernels |   | 2 | 32 | October 20, 2025 | 
        
          | Get `state_dict` from `DataDistributedParallel` model while other thread is running `backward` |   | 0 | 10 | October 19, 2025 | 
        
          | Suggested design for multiprocess federated learning |     | 1 | 457 | October 13, 2025 | 
        
          | Use fsdp training, 80 h800 gpu can run success, but 160 h800 gpu oom |   | 0 | 9 | October 11, 2025 | 
        
          | I am running the below code, which is wrong, but still the torch run command runs without any errors? How do I debug this? |     | 3 | 43 | October 6, 2025 | 
        
          | Using debugpy with DDP results in driver leaking GPU memory |     | 1 | 23 | October 2, 2025 | 
        
          | Model.to(device) vs. tensor.to(device) |       | 3 | 179 | September 28, 2025 | 
        
          | Does torch support custom stream for nccl commucation now? |       | 5 | 67 | September 28, 2025 | 
        
          | "Cannot allocate memory" for multinode training |       | 2 | 25 | September 23, 2025 | 
        
          | How to apply selective activation checkpointing on _grouped_mm |   | 0 | 41 | September 20, 2025 | 
        
          | DistributedDataParallel init hangs |   | 1 | 178 | September 20, 2025 | 
        
          | Process stuck by the dist.barrier() using DDP after dist.init_process_group |     | 2 | 468 | September 20, 2025 | 
        
          | Proper way to call torch.distributed.send/recv |     | 4 | 62 | September 18, 2025 | 
        
          | Understanding relation of FSDP and TP |   | 0 | 36 | September 16, 2025 | 
        
          | Support for Ulysses/Ring distributed attention for long-context training (32k) for 32B dense models |   | 0 | 64 | September 15, 2025 | 
        
          | WebDataset Multi-GPU Single-Node |     | 3 | 173 | September 15, 2025 | 
        
          | DDP overwriting a buffer with random values |     | 1 | 24 | September 15, 2025 | 
        
          | DDP: model not synchronizing across gpu's |         | 7 | 5342 | September 14, 2025 | 
        
          | Low-level errors when retrying training after OOMs |     | 3 | 62 | September 12, 2025 | 
        
          | Proper way to combine Tensor subclass with FSDP |     | 2 | 39 | September 8, 2025 | 
        
          | Cannot execute loss.backward() for training a specific layer |     | 1 | 20 | September 8, 2025 | 
        
          | DDP does not work with custom gradient (backward) computations |     | 3 | 52 | September 5, 2025 | 
        
          | Avoid OOM due to optimizer state in DDP |       | 6 | 71 | September 4, 2025 | 
        
          | Work vs. Future sync primitives for Distributed Torch backends |     | 2 | 62 | September 4, 2025 | 
        
          | Does FSDP2 support shared modules |   | 1 | 60 | September 2, 2025 | 
        
          | OOM When Resuming From Checkpoint XLA |   | 0 | 22 | September 1, 2025 | 
        
          | Multi-GPU training hangs: Watchdog caught collective operation timeout |           | 16 | 16315 | August 31, 2025 | 
        
          | Zero optimizer.consolidate_state_dict(to=0) hangs |     | 3 | 27 | August 31, 2025 |