Controlling memory usage when training many models

I have an application where I am training and discarding a large number of models (neural architecture search). After running for a while, I begin to get GPU OOMs. I have not completely ruled out the idea that the models are getting excessively large, but based on what I’ve been able to debug I do not think this is the issue (especially since the models are not particularly large…). My guess is that each model allocates a chunk of the GPU’s RAM and then does not release the memory when it is destroyed (or “relases” it by PyTorch doesn’t actually free up the memory). My question is: is there a simple way to make sure the allocate memory is released?