Is batch norm accumulation possible in resnet?

FaultyBagnose · June 23, 2021, 7:41am

Hi,
I would like to train a resnetxxx model on very large images (~3000x3000) in a domain where downscaling loses relevant information. My GPU only has memory enough for a single image at a time, so I look into gradient accumulation, discussed e.g. here. This seems to work fine for the gradient itself, but is it possible to distribute the batch norm calculations over several (micro-)batches also?

ptrblck · June 28, 2021, 8:44am

No, I don’t think batchnorm layers have this functionality added (in case you have multiple GPUs, you could try SyncBatchNorm). For a single GPU, you could try to change the momentum and see, if you could smooth the running estimates.

FaultyBagnose · June 28, 2021, 9:19am

Thanks!
I will try to vary the normalization momentum.

Ideally, I would like to reduce the batch size to 1, and then the minibatch standard deviations cannot even be estimated for a single batch, so neither momentum adjustment nor syncing over GPUs is meaningful. This is why I would like to estimate the normalization parameters (mu and sd) over ‘batches of minibatches’.