NCCL WARN Cuda failure 'out of memory' after multiple hours of DDP training

Oh i need to edit that as well. The “leak” also occurred there but much smaller - i only used 2 GPUs and trained only for 10 epochs. Due to rounding, it looked like it did not appear but it did as well.