Is loss function reduction="something" have significant effect on training performance?

ThaiThien · May 6, 2020, 4:58pm

I want to consider 2 case:

First, with reduction = sum

crit = nn.MSELoss(reduction=‘sum’).to(device)
…
for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0])

and second, with no reduction, we sum when we want output (print something, write log)

crit = nn.MSELoss().to(device)

…
for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0].sum)

The first example is no sum, we sum when we need to print out. The second example is we define reduction=‘sum’. I want to know if it have impact on training process.

That is for my case. But I don’t think it is importance information here: The code was suppose for computer vision problem. Output is a 2D matrix, with batch also, so output is (batch, a, b), input is (batch, channel, a, b).

russellizadi · May 6, 2020, 6:00pm

Depending on the definition of your specific loss function, the reduction may affect the training performance. One of the advantages of reduction=mean is that it makes the update term independent of the batch size. So if you want to use a constant learning rate, you better take the average of loss values. Again, it totally depends on the task and the derivation of the loss function.

nmtp · May 7, 2020, 7:24pm

Just curious, is there ever a time when the reduction=“sum” would be preferable? It seems like a lot more effort to keep track of batch size but I’ve seen “sum” used as well