Hi!
My Dataset
saves a list of parameter values that via some computationally intensive simulation process transform to the high-dimensional tensors that I feed my neural network.
It would be infeasible to save the final tensors in the dataset since I am still working on the simulation process and I would need to regenerate them often before even being able to start training.
I put the generation of the input tensors into the Dataset
’s __getitem__
-method and accelerate the computation using my GPU and which puts me in the (ill-advised) position of loading my tensors onto the GPU in the Dataset
and not after.
In other words, the (target) transform in my Dataset
is compute-intensive and happens on the GPU.
I actually have multiple GPUs at my disposal that I would like to take advantage of.
How do I proceed?
I imagine needing to spawn multiple threads, give each their own device (and Dataset? and/or DataLoader?).
How do I combine the outputs of the multiple threads and put them on the device on which the backpropagation happens?
Thank you very much!