SyncBatchNorm across multiple batches of the SAME gpu - is it possible?

Fake Distributed on 1 GPU

I have big samples, so I can’t use a big batch size. I virtually increase the batch size simply by calling the optimizer.step() every N batches. However, that of course doesn’t help the statistics of BatchNorm that are calculated per batch, and suffer from that. There is only so much I can do with the batchnorm momentum… I would like to simulate a distributed system on 1 GPU and sync the BN layers across multiple fake-parallel batches.
Is that possible?

Is using multiple GPUs to increase the effective batch size not an option for you?