I have two questions about zero_grad()
releasing GPU memory
-
Does
net.zero_grad()
release the GPU memory occupied by the gradients computed from previoue epoch? -
Suppose I have two networks
netD
andnetG
, in the below code snippet, can I add extrazero_grad()
afteroptimizer.step
to release some GPU memory before going to the next epoch? I could zero grad bothnetD
andnetG
at the very beginning of loop but what if I want to save some GPU memory fornetG
after trainingnetD
.
for epoch in epochs:
# train discriminator
netD.zero_grad()
...
loss_d.backward()
optimizer_d.step()
# netD.zero_grad() <------- Can I add another zero_grad() here
to release some GPU memory?
...
...
# train generator
netG.zero_grad()
...
loss_g0.backward()
optimizer_g.step()
# netG.zero_grad() <------- Can I add another zero_grad() here
to release some GPU memory?
...
...
# train generator again using other losses
netG.zero_grad()
...
loss_g1.backward()
optimizer_g.step()
# netG.zero_grad() <------- Can I add another zero_grad() here
to release some GPU memory?