AFAIK, the simplest way to do the distributed training (multiple modes) with Pytorch is something like:
sampler = torch.utils.data.distributed.DistributedSampler(train_data) data_loader = torch.utils.data.DataLoader(dataset, sampler=sampler) model = torch.nn.DataParallel(model).cuda() for data, target in data_loader: out = model(data) ...
But what if I already have a large tensor
data in hand and would like to split and distribute it and get the same output as the above snippet? Specifically,
model = torch.nn.DataParallel(model).cuda() data = do_sth_fuct(data) out = model(data)
Is there a PyTorch API to do so? Otherwise, what is the best way to achieve it? Thank you in advance!