Is loss function reduction="something" have significant effect on training performance?

I want to consider 2 case:

First, with reduction = sum

crit = nn.MSELoss(reduction=‘sum’).to(device)

for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0])

and second, with no reduction, we sum when we want output (print something, write log)

crit = nn.MSELoss().to(device)


for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0].sum)

The first example is no sum, we sum when we need to print out. The second example is we define reduction=‘sum’. I want to know if it have impact on training process.

That is for my case. But I don’t think it is importance information here: The code was suppose for computer vision problem. Output is a 2D matrix, with batch also, so output is (batch, a, b), input is (batch, channel, a, b).

1 Like

Depending on the definition of your specific loss function, the reduction may affect the training performance. One of the advantages of reduction=mean is that it makes the update term independent of the batch size. So if you want to use a constant learning rate, you better take the average of loss values. Again, it totally depends on the task and the derivation of the loss function.

Just curious, is there ever a time when the reduction=“sum” would be preferable? It seems like a lot more effort to keep track of batch size but I’ve seen “sum” used as well

2 Likes