RNN module weights are not part of single contiguous chunk of memory

WendyShang · August 18, 2017, 12:55am

Hi, did you try

self.gru.flatten_parameters()
output, h_n = self.gru(concatenated_input.transpose(0, 1), h_0)

Hao_Chu · August 18, 2017, 1:10am

I’ve tried that, but it doesn’t work.

dpernes · August 18, 2017, 10:10am

This happening not only with GRU but also with (vanilla) RNN and LSTM. I also have no idea why…

Younggun_Lee · September 1, 2017, 11:00am

I have this problem too.
In my case, this problem occurs only when I wrap the whole model with DataParallel.

@WendyShang
What is the proper usage of flatten_parameters() and what does it do?
Should I call flatten_parameters at initialization or at every forward step?

WendyShang · September 17, 2017, 3:29pm

Sorry for the late reply.

github.com

pytorch/pytorch/blob/59b139dabd8689163ec71fbcc987c88f3cc7e5ae/torch/legacy/nn/Module.py#L240




    # 7. fix up the parameter tensors to point at the flattened parameters
    for param, meta in zip(parameters, parameterMeta):
        param.set_(flatParameters.storage(),
                   meta['storage_offset'],
                   meta['size'],
                   meta['stride'])


    return flatParameters


def flattenParameters(self):
    _params = self.parameters()
    if _params is None:
        return
    parameters, gradParameters = _params
    p, g = self._flatten(parameters), self._flatten(gradParameters)


    assert p.nelement() == g.nelement()
    if parameters:
        for param, grad in zip(parameters, gradParameters):
            assert param.storage_offset() == grad.storage_offset()

This is the code for flatten paramters. It makes the memory the parameters and its grads occupied contiguous.

However, eventually I solved this issue by deleting all the intermediate variables, such as h_n, c_n etc (for LSTM).

dylanthomas · September 18, 2017, 4:17am

How should I use flattenParameters to fix the problem? Where am I supposed to put the code ?

Yu_Li · October 18, 2017, 9:49am

Hi, did your problem solved? I have the save question and I’m looking forward to the answer.

k_drosos · November 14, 2017, 5:56pm

Hi, any news on that warning? I’m using GRUs and I have the same issue.

norm · November 27, 2017, 8:14am

Same issue… What changes have to be made in the code?

jef · December 7, 2017, 4:28pm

I also faced same problem. But in my case, the warning was only appeared at first time call of LSTM (forward). And GPU and CPU memory doesn’t increase.
I faced this problem after I changed trainable parameters by myself.

BigeyeDestroyer · December 20, 2017, 4:56pm

I also met this problem when I am trying to stack a few single-layer rnn to constitute a multi-layer one. My code is just like this:

self.rnns = [nn.LSTM(input_size=input_size if l == 0 else 2 * cell_size,
             hidden_size=cell_size, num_layers=1, batch_first=True,
             bidirectional=True) for l in range(num_layers)]

And I solved the problem by adding following line:

self.rnns = torch.nn.ModuleList(self.rnns)

KaiyangZhou · January 2, 2018, 10:51pm

I met the same problem and solved it by adding

self.rnn.flatten_parameters()

into the

def forward(self, x)

at the position before calling

rnn_output, _ = self.rnn(x, h)

By doing so, the error never showed up.

klory · December 13, 2018, 9:08pm

This solves the problem, did not know how this func works…

DoubtWang · December 26, 2018, 1:05am

Hi,
I also use your suggestion, but I get other error message.

  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 170, in forward
    final_dist, sent_dist, word_dist, coverage, target = self.train_model(src_extend_pad, src_len, sent_num_, src_pad_mask, tgt_extend_pad, max_oovs)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 129, in train_model
    sentence_states, sentence_contexts, contexts, features, coverage = self.hierarchy(src_extend_pad, src_len, sent_num)
  File "/home/remote/cnn-daily/models/seq2seq.py", line 108, in hierarchy
    _contexts_, _features_, _states_ = self.encoder(_src_extend_pad_, length_.tolist())
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/remote/cnn-daily/models/rnn.py", line 287, in forward
    hiddens, states = self.rnn(embs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
    return func(input, *fargs, **fkwargs)
  File "/home/.local/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
    dropout_ts)
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

If I remove the self.rnn.flatten_parameters(), this message will not raise.
Could u give some suggestion ?

Wu_jiang · March 8, 2019, 12:26pm

Can you explain the effect of this operation?
should use in forward or init ?
THX

Wu_jiang · March 8, 2019, 12:27pm

Can you explain the effect of this operation?
should use in forward or init ?
THX

guilhermedrud · November 25, 2019, 2:55pm

This works pretty well for me, thanks

Hiker · November 30, 2019, 5:07am

Thanks a lot ! I’ve solved my problem with you sugesstion.

usakey · December 29, 2019, 2:44am

Thanks for your suggestion, worked perfectly. BTW, could you explain why? THX.

SimZhou · July 18, 2022, 6:15am

Thanks guys,
added self.rnn.flatten_parameters() into forward before calling rnn, and worked.

btw, putting the piece in __init__() does not work.