Order of backward(), step() and zero_grad()

In most codes the order I see is

training loop:
    # forward pass and calculate loss
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

If I change it to:

training loop:
    # forward pass and calculate loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

is it still ok?

Yes, it is clear-fill-use vs fill-use-clear

Yes, but you’ll backpropagate n-1 times only, where n is the number of epochs. Why would you not want to follow the “standard” order of the 3?

there are subtle differences in memory management, esp. when zero_grad(set_to_none=True) is used