RNN module weights are not part of single contiguous chunk of memory

Hao_Chu · August 10, 2017, 2:58am

When I pass input to nn.GRU, it comes across such problem:

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().
output, h_n = self.gru(concatenated_input.transpose(0, 1), h_0)
...
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66

What could cause such problem?

Hao_Chu · August 10, 2017, 6:37am

I solve the OOM problem by setting input Variable to be volatile, while this “not compact” problem remain unsolved.

WendyShang · August 18, 2017, 12:55am

Hi, did you try

self.gru.flatten_parameters()
output, h_n = self.gru(concatenated_input.transpose(0, 1), h_0)

Hao_Chu · August 18, 2017, 1:10am

I’ve tried that, but it doesn’t work.

dpernes · August 18, 2017, 10:10am

This happening not only with GRU but also with (vanilla) RNN and LSTM. I also have no idea why…

Younggun_Lee · September 1, 2017, 11:00am

I have this problem too.
In my case, this problem occurs only when I wrap the whole model with DataParallel.

@WendyShang
What is the proper usage of flatten_parameters() and what does it do?
Should I call flatten_parameters at initialization or at every forward step?

WendyShang · September 17, 2017, 3:29pm

Sorry for the late reply.

github.com

pytorch/pytorch/blob/59b139dabd8689163ec71fbcc987c88f3cc7e5ae/torch/legacy/nn/Module.py#L240




    # 7. fix up the parameter tensors to point at the flattened parameters
    for param, meta in zip(parameters, parameterMeta):
        param.set_(flatParameters.storage(),
                   meta['storage_offset'],
                   meta['size'],
                   meta['stride'])


    return flatParameters


def flattenParameters(self):
    _params = self.parameters()
    if _params is None:
        return
    parameters, gradParameters = _params
    p, g = self._flatten(parameters), self._flatten(gradParameters)


    assert p.nelement() == g.nelement()
    if parameters:
        for param, grad in zip(parameters, gradParameters):
            assert param.storage_offset() == grad.storage_offset()

This is the code for flatten paramters. It makes the memory the parameters and its grads occupied contiguous.

However, eventually I solved this issue by deleting all the intermediate variables, such as h_n, c_n etc (for LSTM).

dylanthomas · September 18, 2017, 4:17am

How should I use flattenParameters to fix the problem? Where am I supposed to put the code ?

Yu_Li · October 18, 2017, 9:49am

Hi, did your problem solved? I have the save question and I’m looking forward to the answer.

k_drosos · November 14, 2017, 5:56pm

Hi, any news on that warning? I’m using GRUs and I have the same issue.

norm · November 27, 2017, 8:14am

Same issue… What changes have to be made in the code?

jef · December 7, 2017, 4:28pm

I also faced same problem. But in my case, the warning was only appeared at first time call of LSTM (forward). And GPU and CPU memory doesn’t increase.
I faced this problem after I changed trainable parameters by myself.

BigeyeDestroyer · December 20, 2017, 4:56pm

I also met this problem when I am trying to stack a few single-layer rnn to constitute a multi-layer one. My code is just like this:

self.rnns = [nn.LSTM(input_size=input_size if l == 0 else 2 * cell_size,
             hidden_size=cell_size, num_layers=1, batch_first=True,
             bidirectional=True) for l in range(num_layers)]

And I solved the problem by adding following line:

self.rnns = torch.nn.ModuleList(self.rnns)

KaiyangZhou · January 2, 2018, 10:51pm

I met the same problem and solved it by adding

self.rnn.flatten_parameters()

into the

def forward(self, x)

at the position before calling

rnn_output, _ = self.rnn(x, h)

By doing so, the error never showed up.

klory · December 13, 2018, 9:08pm

This solves the problem, did not know how this func works…

DoubtWang · December 26, 2018, 1:05am

Hi,
I also use your suggestion, but I get other error message.

  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 170, in forward
    final_dist, sent_dist, word_dist, coverage, target = self.train_model(src_extend_pad, src_len, sent_num_, src_pad_mask, tgt_extend_pad, max_oovs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 129, in train_model
    sentence_states, sentence_contexts, contexts, features, coverage = self.hierarchy(src_extend_pad, src_len, sent_num)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 108, in hierarchy
    _contexts_, _features_, _states_ = self.encoder(_src_extend_pad_, length_.tolist())
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/rnn.py", line 287, in forward
    hiddens, states = self.rnn(embs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
    return func(input, *fargs, **fkwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
    dropout_ts)
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

If I remove the self.rnn.flatten_parameters(), this message will not raise.
Could u give some suggestion ?

Wu_jiang · March 8, 2019, 12:26pm

Can you explain the effect of this operation?
should use in forward or init ?
THX

Wu_jiang · March 8, 2019, 12:27pm

Can you explain the effect of this operation?
should use in forward or init ?
THX

guilhermedrud · November 25, 2019, 2:55pm

This works pretty well for me, thanks

Hiker · November 30, 2019, 5:07am

Thanks a lot ! I’ve solved my problem with you sugesstion.