I’m not quite sure, so it’d be best to get a dev’s opinion on this, but I do know the torch.linalg.solve syncs with the CPU when it’s run on the GPU (so perhaps there’s a memory leak from that?).
When inputs are on a CUDA device, this function synchronizes that device with the CPU. For a version of this function that does not synchronize, see torch.linalg.solve_ex()
So, you could try replacing torch.linalg.solve_ex method (docs here: torch.linalg.solve_ex — PyTorch 2.4 documentation) and see if you get the same memory leak? That would be one way to test this hypothesis (although not conclusive)
Also, if you’re trying to use in-place operations to speed-up pytorch (it’ll make minimal difference in this use case). If you’re purely computing the eigenvalues (and don’t want any gradients), run your code within a torch.no_grad() context manager, which will speed up your code. Docs here: no_grad — PyTorch 2.4 documentation