PyTorch DataParallel not using second GPU during Inference

Yes, this would be possible.Here is a simple example of model sharding.
Basically you can push submodules to specific devices and would have to make sure to push the activation in the forward method to the right device.

Let me know, if you get stuck or need more information.