How can I load a PyTorch (.pth.tar) model with simultaneous parallelization across my GPUs

I have a clear understanding how to load a PyTorch model from a .pth.tar file to a single prescribed GPU, but in my situation, the entire model doesn’t fit on a single GPU. So I need to load a pipeline-parallelized model (from a .pth.tar file) to several GPUs at once.
Is it possible in PyTorch without involvement of additional software? If yes, how to implement this?
Looking for a response.