I have a data loader that defined as link
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset_train)
train_loader = torch.utils.data.DataLoader(dataset_train, batch_size=per_batch_size,
shuffle = (train_sampler is None), num_workers=workers,
pin_memory=True, sampler=train_sampler, drop_last=DROP_LAST)
During training, I used 4 GPUs and the code likes
train_sampler.set_epoch(epoch)
DISP_FREQ = 100 # 100 batch
batch = 0 # batch index
inputs_from_all_gpus
for inputs, labels in tqdm(iter(train_loader)):
#process inputs and labels
#-------------combine inputs from each GPUs to inputs_from_all_gpus --------------------
My question is how can we combine all inputs
from each gpus to the inputs_from_all_gpus
when all process on distributed is done?