Sorry for spawning yet another memory leak thread. I’ve gone through the previous ones and didn’t find that they were the same issue (or version). But perhaps I’m mistaken.
In relation to a few previous posts I’ve made, (specifically on working with seq2seq training models, and the fact that
LSTMCell's aren’t cuda enabled) I’ve come to a place where I have to iterate a sequence one-by-one through
LSTM layer to generate my seq2seq (this is the motivation of the code below which corresponds to the decoder half of a VRAE).
So I’ve sprung a dreaded memory leak, and I’m not sure why. Here’s my minimal code:
lstm = nn.LSTM( 5, 512, 2 ).double().cuda() ll = nn.Linear( 512, 5 ).double().cuda() h_t = Variable( torch.cuda.DoubleTensor(2, 1, 512) , requires_grad=False).cuda() c_t = Variable( torch.cuda.DoubleTensor(2, 1, 512) , requires_grad=False).cuda() out = Variable( torch.cuda.DoubleTensor(1, 1, 5 ), requires_grad=False).cuda() out, (h_t, c_t) = test_lstm( out , (h_t, c_t) ) # <- warmup the run - first memory reading out = test_ll(out.squeeze(1)).unsqueeze(1) for i in range( 200 ): out, (h_t, c_t) = lstm( out , (h_t, c_t) ) out = ll(out.squeeze(1)).unsqueeze(1) print( "%d %d" % (i, datetime.datetime.now().microsecond)) gc.collect()
This code works without a hitch on the cpu.
On the gpu, I start with process GPU consumption of 189MiB at start, 413MiB at the first checkpoint, and then the following output:
1 780961 2 803360 ... 66 886347 67 903248 68 921229 THCudaCheck FAIL file=/py/conda-bld/pytorch_1490980628440/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "/usr/bin/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-27-226b10408002>", line 2, in <module> out, (h_t, c_t) = test_lstm( out , (h_t, c_t) ) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__ result = self.forward(*input, **kwargs) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 91, in forward output, hidden = func(input, self.all_weights, hx) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 327, in forward return func(input, *fargs, **fkwargs) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 202, in _do_forward flat_output = super(NestedIOFunction, self)._do_forward(*flat_input) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/autograd/function.py", line 224, in forward result = self.forward_extended(*nested_tensors) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended cudnn.rnn.forward(self, input, hx, weight, output, hy) File "/usr/bin/anaconda3/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py", line 247, in forward fn.weight_buf = x.new(num_weights) RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1490980628440/work/torch/lib/THC/generic/THCStorage.cu:66
and at this point, I’m pegged at 4GiB of memory. In the span of 140ms.
Nvidia driver version is 375.39,
nvcc is 8.0, V8.0.61
Moreover, nothing I do at this point frees that memory and I have to respawn my process.
Edit: running this with Variables marked as
volatile instead of
!requires_grad doesn’t end up in a memory problem.
My hope is that I’m doing something stupid. Let me know if you have any questions about setup.