How can I adjust batchnorm2d layers or the final model or the optimizer when minibatching?
For example I have code that looks like this:
batchsize = 64
minibatchsize = 4
nminibatches = (batchsize + minibatchsize - 1) // minibatchsize
optimizer.zero_grad()
for b, (data, target) in enumerate(dataloader):
out = net(data)
loss += compute_loss(out,target)
if (b % nminibatches) == (nminibatches-1):
loss.backward()
optimizer.step()
optimizer.zero_grad()
....
There is a problem with batch normalisation layers and small minibatches. The result of forwarding 16 batches of size 4 is different to forwarding 1 batch of size 64. How can this be corrected?