Flatten_parameters fails with DataParallel?

I have a RNN model. I did something like

self.lstm.flatten_parameters()

with DataParallel on multiple GPUs in order to eliminate a user warning.

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().

However, I got this error

RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

Any ideas on this? Is it a bug for pytorch?

What does it say when it fails?

I managed to fix it, but got a new error with get_grad_fn (details in the question). Do you have any idea? Thanks!