I cannot reproduce the issue on a 3090 and after adding print(torch.cuda.memory_allocated()/1024**2)
to the for loop I get:
4.78515625
4.78515625
4.78515625
...
Is this post also related to this one you’ve created yesterday?
If so, could you create a GitHub issue for the potential memory leak on MPS?