I am trying to train an LSTM network using DataParalell with multiple GPUs and this warning shows up what it means. Will this effect the result of the training?
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at /opt/conda/conda-bld/pytorch_1659484809535/work/aten/src/ATen/native/cudnn/RNN.cpp:968.) self.dropout, self.training, self.bidirectional, self.batch_first)
And Can I train LSTM in parallel since I remember seeing a post about the disadvantages of RNN and its variants that “In practice, however, LSTMs are much slower to train than self-attention networks as they can- not be parallelized at sequence level”