CUDA: Out of memory error when using multi-gpu

ptrblck · October 28, 2021, 12:53am

DDP is not only used for multi-node training, but is also speeding up single-node multi-GPU workloads.
The current proposal is to deprecate DataParllel and in this sense to ramp up the documentation on DDP.