How to sync/combine processed data from distributed dataset?

I have a data loader that defined as link

    train_sampler = torch.utils.data.distributed.DistributedSampler(dataset_train)
    train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=per_batch_size,
                                                shuffle = (train_sampler is None), num_workers=workers,
                                                pin_memory=True, sampler=train_sampler, drop_last=DROP_LAST)
  

During training, I used 4 GPUs and the code likes

        train_sampler.set_epoch(epoch)
        DISP_FREQ = 100  # 100 batch
        batch = 0  # batch index
        inputs_from_all_gpus
        for inputs, labels in tqdm(iter(train_loader)):
              #process inputs and labels
              
       #-------------combine inputs from each GPUs to inputs_from_all_gpus --------------------

My question is how can we combine all inputs from each gpus to the inputs_from_all_gpus when all process on distributed is done?