Torch1.10.2 is slower than torch1.9.1

Hukongtao · August 12, 2022, 7:41am

I recently upgraded my pytorch version from 1.9.1 to 1.10.2 due to project needs. But found that the training speed dropped by 20% to 30% for the same multi-tasks model.

After profiling, I found the reason is that the code of torch1.10 has the following logic:

github.com

pytorch/pytorch/blob/v1.10.2/torch/nn/parallel/distributed.py#L1378


      
                      # already have been joined and have stale module buffers.
                      if self._join_config.enable:
                          authoritative_rank = self._find_common_rank(
                              self._distributed_rank, True
                          )
                      else:
                          # The process with rank 0 is considered the authoritative copy.
                          authoritative_rank = 0
                      # Update self.modules_buffers incase any buffers were
                      # reassigned.
                      self._assign_modules_buffers()
                      self._distributed_broadcast_coalesced(
                          self.modules_buffers,
                          self.broadcast_bucket_size,
                          authoritative_rank,
                      )
          
          
def _passing_sync_batchnorm_handle(self, module):
              for layer in module.modules():
                  if isinstance(layer, torch.nn.modules.SyncBatchNorm):
                      if self.device_type == "cpu":

When using DDP, the model will assign module buffers before each forward.Time will be spent in the following sections:

github.com

pytorch/pytorch/blob/71f889c7d265b9636b93ede9d651c0a9c4bee191/torch/nn/parallel/distributed.py#L752


      
              """
              Assigns module buffers to self.modules_buffers which are then used to
              broadcast across ranks when broadcast_buffers=True. Note that this
              must be called every time buffers need to be synced because buffers can
              be reassigned by user module,
              see https://github.com/pytorch/pytorch/issues/63916.
              """
              # Collect buffers for modules, filtering out buffers that should be ignored.
              named_module_buffers = [
                  (buffer, buffer_name)
                  for buffer_name, buffer in self.module.named_buffers()
              ]
              self.modules_buffers = [
                  buffer
                  for (buffer, buffer_name) in named_module_buffers
                  if buffer_name not in self.parameters_to_ignore
              ]
          
          
def _build_param_to_name_mapping(self, parameters):
              param_to_param_index = {parameters[0][i]: i for i in range(len(parameters[0]))}
              param_set = set(parameters[0])

I would like to know how I can avoid this assign of buffers?