Does an optimizer deallocate the gradients after it updates the weights of model?

raining_day513 · May 16, 2023, 6:37pm

In my current code I wrote:

optimizer.zero_grad()
loss.backward()
optimizer.step()

would it be reasonable to change it to:

loss.backward()
optimizer.step()
optimizer.zero_grad()

if I want to clear the GPU memory usage of gradient after each training loop. It seems that the first version might cause the memory_allocated called in the forward pass to also record those grads. Am I correct?