Dataparallel issue with flatten parameter

def lstmLayer(self, input):
    hidden = self.init_hidden(input.size(2))
    for i in range(input.size(0)):
        out, hidden = self.lstm1(input[i], hidden)
        hidden = self.repackage_hidden(hidden)
    return out

Here’s my code, but when I wrap my code with dataparallel, I get this message:

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters(). out, hidden = self.lstm1(input[i], hidden)

I am not sure where to call the flatten parameter, or is it something else I need to do?

i think it’s this way: self.lstm1.flatten_parameters()

1 Like

Thanks, but where to call this function?

If self is a nn.Module, should we call it in __init__() function or forward() function? Should we call it before or after self.lstm1(input, hidden)? I always get illegal memory access errors

@blackyang I think you have to call self.lstm1.flatten_parameters() above out, hidden = self.lstm1(input[i], hidden) that fixed this issue for me.

4 Likes

@wgharbieh @smth I have added self.lstm1.flatten_parameters() above out, hidden = self.lstm1(input[i], hidden), but I faced another error and got the following message:

Traceback (most recent call last):
  File "steps/run_dpcl.py", line 600, in <module>
    main(device)
  File "steps/run_dpcl.py", line 417, in main
    train(model, mix_mean, mix_var, clean_mean, clean_var, device, ema)
  File "steps/run_dpcl.py", line 149, in train
    output, hidden = model(mix_v, hidden, lengths)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/workspace/pytorch/deep_clustering/model/blstm_dpcl.py", line 58, in forward
    output, hidden = self.blstm(inputs, hidden)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
    return func(input, *fargs, **fkwargs)
  File "/mnt/tools/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
    dropout_ts)
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

Does your code run if you do not use self.lstm1.flatten_parameters()? If so, it could be because you are using DataParallel, can you try using DistributedDataParallel instead? It also depends on the pytorch version that you are using, the fix above seems to work up to pytorch 0.3.0. If you are using pytorch 0.3.1 or 0.4, there is an issue about it on github here: https://github.com/pytorch/pytorch/issues/7092

1 Like

Try switching from DataParallel to DistributedDataParallel. I kind of had a hard time making this work with multiple gpus using DataParallel if I remember correctly.