For the past few days, I’ve been training a model that uses batch normalization. While this normalization is crucial to speed up training, performance drops severely once I switch to eval instead of train mode. While browsing the PyTorch forums, I noticed that a lot of other people were having the same problem. The problem seems to be caused by the fact the running estimates are not reliable when using small batch sizes. For a lot of problems (e.g. segmentation), however, increasing the batch size is not feasible due to memory constraints. The authors of the batch normalization paper acknowledged this issue and wrote a follow-up paper about batch renormalization, a similar technique which should also work with smaller batches.
I was wondering if there were any plans to implement this batch renormalization in PyTorch? It seems like a feature that could be very helpful for all those users experiencing problems with regular batch normalization. I experimented with group, layer and instance normalization and neither of those performed as good as batch normalization, so batch renormalization may turn out to be very useful. I’m aware that it’s possible to implement this directly in Python, but other users already attempted this and reported significantly higher memory usage and time consumption. So, an implementation directly for CPU/GPU seems necessary.
Unfortunately, I’m afraid that I, like most users, lack the necessary skills to implement this feature myself for CPU/GPU in PyTorch.
Hence my question if there were any plans to implement this or if anyone has any tips for me to make my model perform good in both eval and train mode with small batch sizes.
Thanks in advance.