How to train a model with huge classes

Thanks for your reply, what I mean is model parrellism other than data parrellism, seems put softmax layer on cpu and other layers on gpus is a way in pytorch