Thanks for the reply. So, if step() clears the intermediate activations, then why does the memory usage increase? For the weight update, it’s clear. But after that, empty_cache() should release the memory again, shouldn’t it?
Yes, I am using functions for training and evaluation. I just wanted to give a small example that’s easy to reproduce. But in this small example, I am explicitly deleting the variables, so nothing should be retained.