It is because the cuda backend uses a caching allocator. This means that the memory is freed but not returned to the device.
if after running del test you allocate more memory with test2 = torch.Tensor(1000,1000), you will see that the memory usage will stay exactly the same: it did not re-allocated memory but re-used the one that had been freed when you ran del test.
Hi @albanD,
I am using different models for inference. But my gpu space is only 10gb.
How do I unload a model from cuda and switch/load another model to cuda?
Doing model.cpu() will move it back to the cpu.
Assuming that nothing else references the weights, they will be freed and returned to our allocator to be used for other Tensors.