Yes, this would be possible.Here is a simple example of model sharding.
Basically you can push submodules to specific devices and would have to make sure to push the activation in the forward
method to the right device.
Let me know, if you get stuck or need more information.