Hi,
I train a official MNIST example, set the random seed to 1 to remove all randomness and set mbsize = 1 or 2 to see the loss change.
I use 2 ways to calculate the 1st and 2nd sample’s loss, one is set mbsize = 1 then run 2 iterations (without optimizer.step()):
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
print(loss.data[0])
loss.backward()
The loss of 1st and 2nd samples are:
2.1525135040283203
2.3103649616241455
Another is set mbsize = 2, split the mbsize to each sample and feed-forward respectively.
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
mbsize = data.size()[0]
optimizer.zero_grad()
for i in range(mbsize):
data_x, target_x = Variable(data[i:i+1]), Variable(target[i:i+1])
# optimizer.zero_grad()
output = model(data_x)
loss = F.nll_loss(output, target_x)
print(loss.data[0])
loss.backward()
The loss of 1st and 2nd samples are same with above:
2.1525135040283203
2.3103649616241455
However when I set mbsize = 2 and calculate the mini-batch loss together, the loss becames:
2.324004888534546
It seems neither sum nor average of these 2 samples loss, so what is it?
Thanks