RNN module weights are not part of single contiguous chunk of memory

(ProKil) #1

When I pass input to nn.GRU, it comes across such problem:

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().
output, h_n = self.gru(concatenated_input.transpose(0, 1), h_0)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66

What could cause such problem?

(ProKil) #2

I solve the OOM problem by setting input Variable to be volatile, while this “not compact” problem remain unsolved.

(Wendy Shang) #3

Hi, did you try

output, h_n = self.gru(concatenated_input.transpose(0, 1), h_0)

(ProKil) #4

I’ve tried that, but it doesn’t work.

(Diogo Pernes) #5

This happening not only with GRU but also with (vanilla) RNN and LSTM. I also have no idea why…

(Younggun Lee) #6

I have this problem too.
In my case, this problem occurs only when I wrap the whole model with DataParallel.

What is the proper usage of flatten_parameters() and what does it do?
Should I call flatten_parameters at initialization or at every forward step?

(Wendy Shang) #7

Sorry for the late reply.

This is the code for flatten paramters. It makes the memory the parameters and its grads occupied contiguous.

However, eventually I solved this issue by deleting all the intermediate variables, such as h_n, c_n etc (for LSTM).


How should I use flattenParameters to fix the problem? Where am I supposed to put the code ?

(Yu Li) #9

Hi, did your problem solved? I have the save question and I’m looking forward to the answer.

(Kostas) #10

Hi, any news on that warning? I’m using GRUs and I have the same issue.

(norm) #11

Same issue… What changes have to be made in the code?


I also faced same problem. But in my case, the warning was only appeared at first time call of LSTM (forward). And GPU and CPU memory doesn’t increase.
I faced this problem after I changed trainable parameters by myself.

(Bigeye Destroyer) #13

I also met this problem when I am trying to stack a few single-layer rnn to constitute a multi-layer one. My code is just like this:

self.rnns = [nn.LSTM(input_size=input_size if l == 0 else 2 * cell_size,
             hidden_size=cell_size, num_layers=1, batch_first=True,
             bidirectional=True) for l in range(num_layers)]

And I solved the problem by adding following line:

self.rnns = torch.nn.ModuleList(self.rnns)

(Kaiyang) #14

I met the same problem and solved it by adding


into the

def forward(self, x)

at the position before calling

rnn_output, _ = self.rnn(x, h)

By doing so, the error never showed up.


This solves the problem, did not know how this func works…


I also use your suggestion, but I get other error message.

  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 170, in forward
    final_dist, sent_dist, word_dist, coverage, target = self.train_model(src_extend_pad, src_len, sent_num_, src_pad_mask, tgt_extend_pad, max_oovs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 129, in train_model
    sentence_states, sentence_contexts, contexts, features, coverage = self.hierarchy(src_extend_pad, src_len, sent_num)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 108, in hierarchy
    _contexts_, _features_, _states_ = self.encoder(_src_extend_pad_, length_.tolist())
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/rnn.py", line 287, in forward
    hiddens, states = self.rnn(embs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
    return func(input, *fargs, **fkwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

If I remove the self.rnn.flatten_parameters(), this message will not raise.
Could u give some suggestion ?