Sorry for spawning yet another memory leak thread. I’ve gone through the previous ones and didn’t find that they were the same issue (or version). But perhaps I’m mistaken.
In relation to a few previous posts I’ve made, (specifically on working with seq2seq training models, and the fact that LSTMCell's
aren’t cuda enabled) I’ve come to a place where I have to iterate a sequence one-by-one through LSTM
layer to generate my seq2seq (this is the motivation of the code below which corresponds to the decoder half of a VRAE).
So I’ve sprung a dreaded memory leak, and I’m not sure why. Here’s my minimal code:
lstm = nn.LSTM( 5, 512, 2 ).double().cuda()
ll = nn.Linear( 512, 5 ).double().cuda()
h_t = Variable( torch.cuda.DoubleTensor(2, 1, 512) , requires_grad=False).cuda()
c_t = Variable( torch.cuda.DoubleTensor(2, 1, 512) , requires_grad=False).cuda()
out = Variable( torch.cuda.DoubleTensor(1, 1, 5 ), requires_grad=False).cuda()
out, (h_t, c_t) = test_lstm( out , (h_t, c_t) ) # <- warmup the run - first memory reading
out = test_ll(out.squeeze(1)).unsqueeze(1)
for i in range( 200 ):
out, (h_t, c_t) = lstm( out , (h_t, c_t) )
out = ll(out.squeeze(1)).unsqueeze(1)
print( "%d %d" % (i, datetime.datetime.now().microsecond))
gc.collect()
This code works without a hitch on the cpu.
On the gpu, I start with process GPU consumption of 189MiB at start, 413MiB at the first checkpoint, and then the following output:
1 780961
2 803360
...
66 886347
67 903248
68 921229
THCudaCheck FAIL file=/py/conda-bld/pytorch_1490980628440/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "/usr/bin/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-27-226b10408002>", line 2, in <module>
out, (h_t, c_t) = test_lstm( out , (h_t, c_t) )
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 327, in forward
return func(input, *fargs, **fkwargs)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 202, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 224, in forward
result = self.forward_extended(*nested_tensors)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py", line 247, in forward
fn.weight_buf = x.new(num_weights)
RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1490980628440/work/torch/lib/THC/generic/THCStorage.cu:66
and at this point, I’m pegged at 4GiB of memory. In the span of 140ms.
Nvidia driver version is 375.39,
nvcc is 8.0, V8.0.61
pytorch 0.1.11+27fb875
Moreover, nothing I do at this point frees that memory and I have to respawn my process.
Edit: running this with Variables marked as volatile
instead of !requires_grad
doesn’t end up in a memory problem.
My hope is that I’m doing something stupid. Let me know if you have any questions about setup.