High memory usage in seq2seq model

Zhujunnan · November 18, 2017, 3:23am

Hi,
I am running a seq2seq model, but it is quickly out of memory. I checked some related questions, some suggested to detech the hidden state. I want to ask if this way can affect calculation of NLL loss and BP. Or is there any way to help alleviate memory usage?

SimonW · November 18, 2017, 7:20am

Are you running your code on CPU, CUDA w/ cuDNN, or CUDA w/o cuDNN? Are you running our of GPU memory or CPU memory?

Zhujunnan · November 18, 2017, 7:21am

On CUDA 7.5, and I get out of GPU memory.

SimonW · November 18, 2017, 7:22am

What about cuDNN? What GPU model do you have? Which RNN architecture are you using?

Zhujunnan · November 18, 2017, 7:32am

My cuDNN version is 5.0.5 and GPU is Titan X. I use bidirectional LSTM, and length of input text is one hundred.

SimonW · November 18, 2017, 4:49pm

Thanks for providing the information. That is weird. Assuming input dim and batch dim is not too big, this shouldn’t happen. Are you able to run one iteration? Or is the memory usage increasing each iteration?

Zhujunnan · November 19, 2017, 2:56am

It seems that reducing batch size can solve this problem. Maybe BPTT can help to reduce memory usage further.

SimonW · November 19, 2017, 3:48am

Okay, then it is likely that your model is just using that much memory.