PyTorch DataParallel not using second GPU during Inference

ptrblck · June 11, 2020, 8:16am

Yes, this would be possible.Here is a simple example of model sharding.
Basically you can push submodules to specific devices and would have to make sure to push the activation in the forward method to the right device.

Let me know, if you get stuck or need more information.