I encountered a weird problem when using a one-hidden-layer fully connected network without using batch normalization: the test set performance varies hugely when using different batch sizes. To be clear, I did switch the mode of the network to mlp.eval()
before doing the actual testing – although I think this does not matter in my case since the network does not have batch normalization, but it does have dropout
.
Here is the code snippet:
mlp = MLPNet(configs) if args.cuda: mlp = mlp.cuda() optimizer = optim.Adadelta(mlp.parameters(), lr=lr) mlp.train() for t in xrange(num_epochs): running_loss = 0.0 train_loader = data_loader(source_insts, source_labels, batch_size) for xs, ys in train_loader: xs, ys = torch.from_numpy(xs), torch.from_numpy(ys) if args.cuda: xs, ys = xs.cuda(), ys.cuda() xs, ys = Variable(xs, requires_grad=False), Variable(ys, requires_grad=False) optimizer.zero_grad() ypreds = mlp(xs) loss = F.nll_loss(ypreds, ys) running_loss += loss.data[0] loss.backward() optimizer.step() logger.info("Iteration {}, loss value = {}".format(t, running_loss)) time_end = time.time() logger.info("Time used for training on {} = {} seconds.".format(data_name[i], time_end - time_start)) # Test on other data sets. mlp.eval() for j in xrange(num_data_sets): target_idx = j target_insts = data_insts[j][num_trains:, :].todense().astype(np.float32) target_labels = data_labels[j][num_trains:, :].ravel().astype(np.int64) test_loader = data_loader(target_insts, target_labels, batch_size) num_corrects = 0.0 for xs, ys in test_loader: xs, ys = torch.from_numpy(xs), torch.from_numpy(ys) if args.cuda: xs, ys = xs.cuda(), ys.cuda() xs, ys = Variable(xs, requires_grad=False), Variable(ys, requires_grad=False) ypreds = mlp(xs) num_corrects += torch.sum(torch.max(ypreds, 1)[1] == ys).cpu().data[0] acc = num_corrects / float(target_insts.shape[0])
what I found is that when I change the batch_size
in test_loader, the final acc
will vary drastically. Any ideas on what’s the problem for this?