@ptrblck I’m measuring allocated system memory by watching the memory for the Python process in htop rise and fall as I run my script in a repl.
I also just did a longer experiment using the MacOS version of CPU-only PyTorch 1.5.1 (which is the only place I’ve seen the LSTM memory released correctly so far). The memory usage remained bounded (below 400MB), and it was completely released at each stage.
I just started running the same script on the same data, with the only difference being the Linux version of CPU-only PyTorch 1.5.1 (this one specifically: https://download.pytorch.org/whl/cpu/torch-1.5.1%2Bcpu-cp38-cp38-linux_x86_64.whl). In about five minutes it has already consumed a gigabyte, and continues to climb.
I’ll see if I can figure out what’s different between the two versions of CPU-only PyTorch tomorrow, but I’m out of my depth if it’s an MKL-DNN bug or something.
By the way, the script I’m currently running involves my actual application code, including training, which I can’t share. However, if it would be helpful, I’d be happy to craft a minimal example I can share that exhibits the same behavior. It doesn’t appear hard to replicate.