I need your help.
I only have one GPU installed in my computer. I am currently working on architecture selection hence I test different architectures (e.g. 10 architectures) with one run. I do it one by one.
for i in Architectures:
model = network(architecture = i)
for mini_epoch in Total_epochs:
##trains the architecture
loss = model(input)
My problem is that something is accumulating in the GPU memory that I cannot finish training all the architectures even though I treat them as a single model at a time. Testing the architectures one at a time works but it is too tedious. Is there a way to release GPU memory when a new architecture is passed as new model?
Thanks in advance!
The second bullet does the job for me. Thanks!
If i want to remove module of model, can i just do model.layer = None? Or del model.layer
Both will work.
The only difference is that if you access it later, in one case your will get
None and in the other case you will get an error saying that there is no attribute with this name.
But how to remove param from optimizer?
That would be trickier, I don’t think our optimizer API supports removing parameters.
That being said, if the
.grad field of that Tensor is
None. Then the optimizer will just ignore it. And because it is not in the network anymore, it won’t be updated. So you can leave it in the optimizer.
I also have same problem in my code. Luckily, I found the method to solve it.
you have to set torch.backends.cudnn.benchmark=False