Thanks for the reply. So, if step()
clears the intermediate activations, then why does the memory usage increase? For the weight update, it’s clear. But after that, empty_cache()
should release the memory again, shouldn’t it?
Yes, I am using functions for training and evaluation. I just wanted to give a small example that’s easy to reproduce. But in this small example, I am explicitly deleting the variables, so nothing should be retained.