Imbalanced GPU memory usage training LSTM

Yuzhou_Song · March 16, 2018, 10:25pm

I’m training a language model using the code here: examples/word_language_model/main.py at main · pytorch/examples · GitHub

I have made some slight changes so that the model can be trained across multiple GPUs. However, the GPU memory usage is extremely imbalanced.

I can understand that one GPU is set to gather and store all outputs. I wonder if there is any way I can balance the memory usage? Or can I set one GPU for gathering outputs and the rest for training on batches?

Thanks,
Yuzhou

Yuzhou_Song · April 25, 2018, 12:11am

I found the reason, it is because we collect the output back to one gpu and calculate loss there. If move loss calculation into model.forward(), the problem is resolved.

soulless · August 8, 2018, 4:41am

Does this actually speed things up? I am facing a similar issue. Thanks

Yuzhou_Song · August 8, 2018, 5:06am

I think Pytorch should fix this by moving parameter saving and back prop to CPU, just like what Tensorflow does. I will switch to TF.