In my current code I wrote:
optimizer.zero_grad()
loss.backward()
optimizer.step()
would it be reasonable to change it to:
loss.backward()
optimizer.step()
optimizer.zero_grad()
if I want to clear the GPU memory usage of gradient after each training loop. It seems that the first version might cause the memory_allocated
called in the forward pass to also record those grad
s. Am I correct?