The network usually accept any number of element in the batch and the loss functions average over the batch size.
So if you use a batch_size of 2 and backward or if you do twice batch size of 1 and backward each of them. You will be off by a factor of 2: the first one took the average while the second one took the sum.