Is that normal that cross entropy loss is increasing by increasing the batch size?
I have the following loss:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
I am comparing the batch size of 32 using two methods:
1- Using device batch size=32
2- Using device batch size=2 with gradient accumulation step=16
For the first approach, loss starts fro, 0.6599792838096619 while for the second approach it starts from 0.0303945392370224. It looks like it is scaled by the batch size. Do I need to divide it by batch size again before backward or it is correct?
By default, CrossEntropyLoss does not increase with batch size.
From its documentation, unless you explicitly construct it with reduction = 'sum', it will default to reduction = 'mean',
for which “the sum of the output will be divided by the number
of elements in the output.”
Here is a short (pytorch version 0.3.0) script that illustrates this:
As noted above, the larger loss is not coming from the larger batch
size fed to CrossEntropyLoss. Without seeing your actual code,
especially how you implement