How to sync/combine processed data from distributed dataset?

I have a data loader that defined as link

    train_sampler =
    train_loader =, batch_size=per_batch_size,
                                                shuffle = (train_sampler is None), num_workers=workers,
                                                pin_memory=True, sampler=train_sampler, drop_last=DROP_LAST)

During training, I used 4 GPUs and the code likes

        DISP_FREQ = 100  # 100 batch
        batch = 0  # batch index
        for inputs, labels in tqdm(iter(train_loader)):
              #process inputs and labels
       #-------------combine inputs from each GPUs to inputs_from_all_gpus --------------------

My question is how can we combine all inputs from each gpus to the inputs_from_all_gpus when all process on distributed is done?