I train a official MNIST example, set the random seed to 1 to remove all randomness and set mbsize = 1 or 2 to see the loss change.
I use 2 ways to calculate the 1st and 2nd sample’s loss, one is set mbsize = 1 then run 2 iterations (without optimizer.step()):
for batch_idx, (data, target) in enumerate(train_loader): if args.cuda: data, target = data.cuda(), target.cuda() data, target = Variable(data), Variable(target) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) print(loss.data) loss.backward()
The loss of 1st and 2nd samples are:
Another is set mbsize = 2, split the mbsize to each sample and feed-forward respectively.
for batch_idx, (data, target) in enumerate(train_loader): if args.cuda: mbsize = data.size() optimizer.zero_grad() for i in range(mbsize): data_x, target_x = Variable(data[i:i+1]), Variable(target[i:i+1]) # optimizer.zero_grad() output = model(data_x) loss = F.nll_loss(output, target_x) print(loss.data) loss.backward()
The loss of 1st and 2nd samples are same with above:
However when I set mbsize = 2 and calculate the mini-batch loss together, the loss becames:
It seems neither sum nor average of these 2 samples loss, so what is it?