I only have one GPU installed in my computer. I am currently working on architecture selection hence I test different architectures (e.g. 10 architectures) with one run. I do it one by one.
for i in Architectures:
model = network(architecture = i)
....
for mini_epoch in Total_epochs:
##trains the architecture
loss = model(input)
loss.backward()
optimizer.step()
My problem is that something is accumulating in the GPU memory that I cannot finish training all the architectures even though I treat them as a single model at a time. Testing the architectures one at a time works but it is too tedious. Is there a way to release GPU memory when a new architecture is passed as new model?
We have a custom allocator, so even when the memory is released, you won’t see it available on nvidia-smi but you will be able to use it in pytorch.
The memory is realeased only when you don’t reference it anymore. You might want to wrap the content of your inner loop in a function so that all the intermediary results go out of scope (and are thus released) between loop iterations.
Both will work.
The only difference is that if you access it later, in one case your will get None and in the other case you will get an error saying that there is no attribute with this name.
That would be trickier, I don’t think our optimizer API supports removing parameters.
That being said, if the .grad field of that Tensor is None. Then the optimizer will just ignore it. And because it is not in the network anymore, it won’t be updated. So you can leave it in the optimizer.