Do loss.backward and optimizer.step with different frquency

Hi I’m new to pytorch.

If I do loss.backward for every 0.1% of my training set and do optimizer.step for every 1% of my training set, what could be the problem?

Due to characteristic of my training data, I write my code as below

BATCH_SIZE = parameter from user
for epoch in range(1,10001):
     i=0
     (..........)
     for ..... in training_generator:
            (.......)
            loss.backward()
            i+=1
            if i % BATCH_SIZE == 0 :
                  optimizer.step()
                  optimizer.zero_grad()

As far as I know, loss.backward accumulate gradient in summation. This kind of accumulation of gradient is okay? Is there any way to divide gradient by BATCH_SIZE?

In my case, I use Adam optimizer.

Based on your description I understand that you are calling optimizer.step() more often (1 out of 100 steps) and calculate the gradients only 1 out of 1000 steps.
In this case, the general problem could be that the optimizer updates the parameters with “old” gradients, which might not work.

To change the gradients, you could either scale the loss itself (divide with a constant) or use hooks to manipulate the .grad attributes of all parameters.

1 Like

Sorry for my English. I write sentence make you confused. Acutally, I want to say calling loss.backward more often than optimizer.step.

Although, you give the answer what I want to know! Thanks!