Hi,

I train a official MNIST example, set the random seed to 1 to remove all randomness and set mbsize = 1 or 2 to see the loss change.

I use 2 ways to calculate the 1st and 2nd sample’s loss, one is set mbsize = 1 then run 2 iterations (without optimizer.step()):

```
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
print(loss.data[0])
loss.backward()
```

The loss of 1st and 2nd samples are:

2.1525135040283203

2.3103649616241455

Another is set mbsize = 2, split the mbsize to each sample and feed-forward respectively.

```
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
mbsize = data.size()[0]
optimizer.zero_grad()
for i in range(mbsize):
data_x, target_x = Variable(data[i:i+1]), Variable(target[i:i+1])
# optimizer.zero_grad()
output = model(data_x)
loss = F.nll_loss(output, target_x)
print(loss.data[0])
loss.backward()
```

The loss of 1st and 2nd samples are same with above:

2.1525135040283203

2.3103649616241455

However when I set mbsize = 2 and calculate the mini-batch loss together, the loss becames:

2.324004888534546

It seems neither sum nor average of these 2 samples loss, so what is it?

Thanks