gfotedar
(Gaurav)
#1
In most codes the order I see is
training loop:
# forward pass and calculate loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
If I change it to:
training loop:
# forward pass and calculate loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
is it still ok?
Yes, it is clear-fill-use vs fill-use-clear
rad
(Andrei-Cristian Rad)
#3
Yes, but you’ll backpropagate n-1 times only, where n is the number of epochs. Why would you not want to follow the “standard” order of the 3?
there are subtle differences in memory management, esp. when zero_grad(set_to_none=True) is used