Excessive UserWarning output for RNN training

Hi All,

I am brand new to Pytorch, having come from Keras, and am in the process of converting an RNN model. Because the Pytorch RNN model does not currently support recurrent dropout, which Keras does, I have turned to https://github.com/keitakurita/Better_LSTM_PyTorch . The “Better LSTM” model is giving me results comparable to Keras. But what is not working very well for me is the excessive number of “User Warnings” during training.

Version information:

sys.version: 3.7.5 (default, Oct 25 2019, 15:51:11)
[GCC 7.3.0]
torch.version: 1.3.1
device: cuda
cuda version: nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Sample output:

Epoch 1/500/opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cudnn/RNN.cpp:1268: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Epoch 1/500 ETA 13s Step 1/609: loss: 0.942658/opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cudnn/RNN.cpp:1268: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Epoch 1/500 ETA 11s Step 2/609: loss: 0.725145/opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cudnn/RNN.cpp:1268: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
Epoch 1/500 ETA 9s Step 3/609: loss: 0.644127/opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cudnn/RNN.cpp:1268: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
.
.
.

I did try modifying the “Better LSTM” code to include a call to flatten_parameters at the beginning of forward(), but it made no difference to the number of UserWarning’s that were emitted.

Suggestions?

Thanks,

Lars