Training LSTM with multiple GPU and get this Warning

Peterwisu · December 24, 2022, 11:18am

Hi,

I am trying to train an LSTM network using DataParalell with multiple GPUs and this warning shows up what it means. Will this effect the result of the training?

UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484809535/work/aten/src/ATen/native/cudnn/RNN.cpp:968.)
  self.dropout, self.training, self.bidirectional, self.batch_first)

And Can I train LSTM in parallel since I remember seeing a post about the disadvantages of RNN and its variants that “In practice, however, LSTMs are much slower to train than self-attention networks as they can- not be parallelized at sequence level”

Thank you

ptrblck · December 24, 2022, 5:28pm

No, it will only potentially increase the memory usage as explained in the error message:

Yes, you can use data parallel approaches, but I would generally recommend to use DistributedDataParallel for a better performance than DataParallel.