I’m trying to understand the difference between reduction=‘sum’ and reduction=‘mean’.

From what I understand, with ‘sum’ the loss is summed for every example across every element. For ‘mean’ the loss is summed for every example across every element and then divided by the total amount of examples*elements.

I’m confused why the code below produces different losses:

```
loss_REC = nn.MSELoss(reduction='sum')
loss_REC_mean = nn.MSELoss(reduction='mean')
L = loss_REC(x_t_hat, x_t)/float(100*64*64) # The shape of x_t is [100, 1, 64, 64]
L_mean = loss_REC_mean(x_t_hat, x_t)
print("L: ", L)
print("L_mean: ", L_mean)
```

Results:

L: 0.5480

L_mean: 0.0815

2 Likes

Pytorch 1.1.0

import torch

import torch.nn as nn

x_t=torch.rand((100,1,64,64))

x_t_hat=torch.rand((100,1,64,64))

loss_REC = nn.MSELoss(reduction=‘sum’)

loss_REC_mean = nn.MSELoss(reduction=‘mean’)

L = loss_REC(x_t_hat, x_t)/float(100*64*64) # The shape of x_t is [100, 1, 64, 64]

L_mean = loss_REC_mean(x_t_hat, x_t)

print("L: ", L)

print("L_mean: ", L_mean)

L is the same as L_mean

I also did this test with random tensors and got the same result as you (L being the same as L_mean).

I looked into my actual code more, and I realized it was just my own misunderstanding of python and the loss being mutable:

```
L1 = loss_REC(x_t_hat, x_t)/float(100*64*64)
L1_mean = loss_REC_mean(x_t_hat, x_t)
# Losses match here
print('reconstruction loss:{:.4f}'.format(L1.item()))
print('reconstruction loss mean:{:.4f}'.format(L1_mean.item()))
# Problem in my other code
loss = L1
loss += 1
# Losses no longer match
print('reconstruction loss:{:.4f}'.format(L1.item()))
print('reconstruction loss mean:{:.4f}'.format(L1_mean.item()))
```

Thanks for double checking!