Hello,
I have a network that uses 8 LSTM layers of size 512 each, with a batch size of 1024 and a seq length of 512. The result seems to cause python to use 100% of memory and cause my PC to freeze up, even though I have upgraded to 32 GB of RAM.
I am using a large batch size because I have a very large training dataset so it seems to make sense to get through the data quicker. When I make a smaller batch size and seq length then it doesn’t freeze up, but is unreasonably slow.
- Is there a way to have PyTorch not consume so much memory?
- Would it help if I increased the RAM again? It seems like it made no difference
- Is the size of the batches and / or architecture unreasonable? My limited experience with RNNs seems like the architecture isn’t that big, but maybe I’m mistaken
Thanks,
Nathan