When to call flatten_parameters() in LSTM when using multi-gpus

lihx · November 21, 2017, 12:59pm

When using dataparallel to train my LSTM with multi-gpus, I receive this error,

anaconda3/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py:42 : UserWarning: RNN module weights are not part of single contiguous chunk of memory. This mea ns they need to be compacted at every call, possibly greately increasing memory usage. To com pact weights again call flatten_parameters().
output = module(*input, **kwargs)

But I call flatten_parameters in the forward function.
Here’s my code below:

def forward(self, x):
    x = self.share.forward(x)
    x = x.view(-1, 2048)
    x = x.view(-1, sequence_length, lstm_in_dim)
    x = x.permute(1, 0, 2)

self.lstm.flatten_parameters()
y, self.hidden = self.lstm(x, self.hidden)
self.lstm.flatten_parameters()

    y = y.contiguous().view(1, sequence_length, -1, lstm_out_dim) 
    y = y.permute(0, 2, 1, 3).contiguous()
    y = y.view((-1, lstm_out_dim))
    y = self.bn(y)
    y = self.fc(y)
    return y

I am new to pytorch. Can anyone here help me with this problem?

lihx · November 23, 2017, 11:44am

I solved the problem by deleting the init_hidden function.
I used to call the init_hidden every time before forward function just like the tutorial on the official website and initialized the hidden variables with (batch_size/num_gpu). It might cause error in dataparallel.
By the way, the warning “UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().” still exists, but everything seems OK.