I am working on adapting my reinforcement learning project from single gpu-training to multi-gpu training.
For single gpu training, when batch size is 10, the gpu memory usage is around 800MB.
However, for multi-gpu training, I used the batch size 10 with two gpus (5 examples for each gpu), my gpu memory usage first jump to 7GB and increase that memory usage slowly and I will get the memory leak finally.
Since I need to adapt my code from single gpu to multi-gpus, there are a lot of variables reshape, transpose, torch.cat, tensor squeeze or unsqueeze operations. Are any of the above tensor operation will create a lot of duplicates of tensors, which lead to the memory leak? If there is a lot of duplicates of the variables, are they going to influence my .backward() function? I noticed that I always have memory leak in the line with ".backward() " function.