Distribute Dataset Target Transform to GPUs


My Dataset saves a list of parameter values that via some computationally intensive simulation process transform to the high-dimensional tensors that I feed my neural network.

It would be infeasible to save the final tensors in the dataset since I am still working on the simulation process and I would need to regenerate them often before even being able to start training.

I put the generation of the input tensors into the Dataset’s __getitem__-method and accelerate the computation using my GPU and which puts me in the (ill-advised) position of loading my tensors onto the GPU in the Dataset and not after.
In other words, the (target) transform in my Dataset is compute-intensive and happens on the GPU.

I actually have multiple GPUs at my disposal that I would like to take advantage of.
How do I proceed?
I imagine needing to spawn multiple threads, give each their own device (and Dataset? and/or DataLoader?).
How do I combine the outputs of the multiple threads and put them on the device on which the backpropagation happens?

Thank you very much!

Hi, I tried to answer your question but without a detailed context I am not sure if it’s really going to help you.

Can you try to use torch.utils.data.distributed.DistributedSampler in torch.utils.data — PyTorch 2.0 documentation and torch.nn.parallel.DistributedDataParallel in DistributedDataParallel — PyTorch 2.0 documentation?

We need to do multi-process programing here.