I want to consider 2 case:
First, with reduction = sum
crit = nn.MSELoss(reduction=‘sum’).to(device)
…
for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0])
and second, with no reduction, we sum when we want output (print something, write log)
crit = nn.MSELoss().to(device)
…
for data, label in batch:
output = model(data)
loss = crit(output, data)
loss.backward()
print(loss.data[0].sum)
The first example is no sum, we sum when we need to print out. The second example is we define reduction=‘sum’. I want to know if it have impact on training process.
That is for my case. But I don’t think it is importance information here: The code was suppose for computer vision problem. Output is a 2D matrix, with batch also, so output is (batch, a, b), input is (batch, channel, a, b).