SyncBatchNorm with SWA

James_W · July 28, 2022, 8:30pm

Hi thre,

I was wondering if there was any docs on how to use SyncBatchNorm with SWA. I have a mobilenet pretrained model which I converted into SyncBatchnorm using:

    model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)

and then do the DDP stuff and then I tried to update batch stats at the end of the training using the uility fiunction like so:

                    gpu_device=torch.device('cuda:1')
                    swa_model=swa_model.to(gpu_device)
                    print('==>  UPDATING BATCH STATS')
                    torch.optim.swa_utils.update_bn(train_loader, swa_model,gpu_device)
                    print('==>  FINISHED UPDATING BATCH STATS')

It starts updating the batch stats and then nothing happens… literally nothing… no error, no exit… it just sits there…

When I remove syncbatchnorm and train and update stats it all works perfectly.

I browsed through the source of the above:

github.com

pytorch/pytorch/blob/34bdd46e6e6a15dfa1c765c1fb8d9d7bc02dc102/torch/optim/swa_utils.py#L136


      
                      device = p_swa.device
                      p_model_ = p_model.detach().to(device)
                      if self.n_averaged == 0:
                          p_swa.detach().copy_(p_model_)
                      else:
                          p_swa.detach().copy_(self.avg_fn(p_swa.detach(), p_model_,
                                                           self.n_averaged.to(device)))
                  self.n_averaged += 1
          
          

          
@torch.no_grad()
          def update_bn(loader, model, device=None):
              r"""Updates BatchNorm running_mean, running_var buffers in the model.
          
          
    It performs one pass over data in `loader` to estimate the activation
              statistics for BatchNorm layers in the model.
              Args:
                  loader (torch.utils.data.DataLoader): dataset loader to compute the
                      activation statistics on. Each data batch should be either a
                      tensor, or a list/tuple whose first element is a tensor
                      containing data.

and I am not exactly sure what is causing this… but I suspect:

github.com

pytorch/pytorch/blob/34bdd46e6e6a15dfa1c765c1fb8d9d7bc02dc102/torch/optim/swa_utils.py#L164


      
              >>> torch.optim.swa_utils.update_bn(loader, model)
          
          
.. note::
              The `update_bn` utility assumes that each data batch in :attr:`loader`
              is either a tensor or a list or tuple of tensors; in the latter case it
              is assumed that :meth:`model.forward()` should be called on the first
              element of the list or tuple corresponding to the data batch.
          """
          momenta = {}
          for module in model.modules():
              if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
                  module.running_mean = torch.zeros_like(module.running_mean)
                  module.running_var = torch.ones_like(module.running_var)
                  momenta[module] = module.momentum
          
          
if not momenta:
              return
          
          
was_training = model.training
          model.train()
          for module in momenta.keys():

and that it is not able to find the module in the model? Totally seem clueless

I seem to get better loss with syncbatchnorm and I would like to update the batch stats and perform inference, but the fact that the updates are not working is blocking me

J

smth · July 28, 2022, 9:31pm

do you have a minimal but full script that one can run.
If it’s hanging, and not doing anything, it probably involves a bit more deeper debugging on what’s happening.

James_W · July 28, 2022, 10:39pm

Hi @smth

Thank you VERY much for your reply. Well when I read the source, I see this:

github.com

pytorch/pytorch/blob/34bdd46e6e6a15dfa1c765c1fb8d9d7bc02dc102/torch/optim/swa_utils.py#L71


      
          Example:
              >>> # Compute exponential moving averages of the weights and buffers
              >>> ema_avg = lambda averaged_model_parameter, model_parameter, num_averaged:\
                                  0.1 * averaged_model_parameter + 0.9 * model_parameter
              >>> swa_model = torch.optim.swa_utils.AveragedModel(model, avg_fn=ema_avg, use_buffers=True)
          
          
.. note::
              When using SWA with models containing Batch Normalization you may
              need to update the activation statistics for Batch Normalization.
              This can be done either by using the :meth:`torch.optim.swa_utils.update_bn`
              or by setting :attr:`use_buffers` to `True`. The first approach updates the
              statistics in a post-training step by passing data through the model. The
              second does it during the parameter update phase by averaging all buffers.
              Empirical evidence has shown that updating the statistics in normalization
              layers increases accuracy, but you may wish to empirically test which
              approach yields the best results in your problem.
          
          
.. note::
              :attr:`avg_fn` is not saved in the :meth:`state_dict` of the model.
          
          
.. note::

So, I assumed this was the correct way to do it and did it… It works, but now I have a different issue. I want to infer the model on CPU, so I normally do:

                            traced_script_module = torch.jit.trace(swa_model, (data))
                            traced_script_module.save("./swa_model.tar")

It complains:

    raise ValueError("SyncBatchNorm expected input tensor to be on GPU")
ValueError: SyncBatchNorm expected input tensor to be on GPU

But I want to infer the model on cpu by jit tracing, and I am not sure how I can get around this issue

I presume it is impossible to jit trace with the model on cpu and data on gpu (due to dtype)