Does this mean you are using a batch size of 1?
If so, nn.DataParallel cannot be used, as the batch will be chunked in the batch dimension (dim0) and each chunk will be scattered to the corresponding device.
If the model is too large for a single GPU, you could use model parallel, yes.
However, I’m currently unsure which use cases work.
What’s the largest batch size for a single GPU and what batch sizes are you using for the data parallel approach?
I’m terribly sorry for not noticing this message earlier.
The largest batch size is 1.
But 1 batch contains 5 images which are of a certain resolution say [1920, 1080]
So the shape of one batch is [1, 5, 3, 1920, 1080].
Downsampling the image in this situation is not possible due to objective, and the per forward pass all 5 images are required. I think the only way to address this problem seems to be to increase the GPU VRAM, but I am already tried using a Tesla V100.
I was wondering, if considering there are two GPUs with say 16 GB + 16 GB, is there a way to distribute inference across both of them (via ModelParallel) or otherwise, to potentially process them this way?
And is there any other way to deal with this? I was interested if this was possible.
Yes, this would be possible.Here is a simple example of model sharding.
Basically you can push submodules to specific devices and would have to make sure to push the activation in the forward method to the right device.
Let me know, if you get stuck or need more information.