Optimizer step requires GPU memory

Thanks for the reply. So, if step() clears the intermediate activations, then why does the memory usage increase? For the weight update, it’s clear. But after that, empty_cache() should release the memory again, shouldn’t it?

Yes, I am using functions for training and evaluation. I just wanted to give a small example that’s easy to reproduce. But in this small example, I am explicitly deleting the variables, so nothing should be retained.