Split and distribute a large tensor like what it does in torch.utils.data.DistributedSampler

AFAIK, the simplest way to do the distributed training (multiple modes) with Pytorch is something like:

sampler = torch.utils.data.distributed.DistributedSampler(train_data)
data_loader = torch.utils.data.DataLoader(dataset, sampler=sampler)
model = torch.nn.DataParallel(model).cuda()

for data, target in data_loader:
    out = model(data)
    ...

But what if I already have a large tensor data in hand and would like to split and distribute it and get the same output as the above snippet? Specifically,

 model = torch.nn.DataParallel(model).cuda()
 data = do_sth_fuct(data)
 out = model(data)

Is there a PyTorch API to do so? Otherwise, what is the best way to achieve it? Thank you in advance!

But what if I already have a large tensor data in hand and would like to split and distribute it and get the same output as the above snippet

It really depends on where this large tensor is stored and how it is loaded. Is this large tensor stored in memory of one of the nodes? It might be helpful if you describe your system a bit more in detail and especially how the large tensor is computed/retrieved.

Hi, thanks for your reply!

The tensor is stored in one of the nodes. More spefcifically, I have, say, two nodes and each of them have 8 gpus. I have a text dataset train.txt. I have written a function that convert the text data to large tensor X.

If I used torch.nn.parallel.DistributedDataParallel in the following way:

model = torch.nn.parallel.DistributedDataParallel(mode.cuda())

Would model(X) do what I want, that is, split the X and distribute the pieces to 16 gpus?