CUDA out of memory error if I don't execute optimizer.step in SWA

I have a program in which I am applying Stochastic Weighted Averaging (SWA) as the optimizer. In this code, if I perform forward prop + back prop, then everything seems to work fine. However, SWA optimizer has a function called “bn_update” that does: -

Updates BatchNorm running_mean, running_var buffers in the model.

It performs one pass over data in loader to estimate the activation
statistics for BatchNorm layers in the model.

This function essentially requires me to ONLY perform forward propagation, while some momentum is calculated. However, in about 2 batches, I get a CUDA out-of-memory error. I know this isn’t an issue of batch size or any of the other standard reasons why out-of-memory error is thrown, because my code requires me to do entire epochs of training prior to running this “bn_update” function using the same dataloader and they execute flawlessly.

So why is this that only forward prop seems to upset GPU memory so much?

Based on your description it seems as if SWA is using an unexpected amount or memory or seems to be increasing it.
Could you post a minimal and executable code snippet showing this behavior, please?