Is batch loss function correct?

I want to batch learning in my code.

The shape of the data is different. So, if you set the batch to 10, you can go ahead loss.backward by going through the data 10 times just forward and averaging the loss value then update. I would like to know if this is the same thing as updating in batch learning.

Using 10 separate forward passes and calling backward after averaging the loss should yield the same gradients. However, certain layers like nn.BatchNorm will behave differently, since the running estimates depend on the batch size.