Thanks for your reply, what I mean is model parrellism other than data parrellism, seems put softmax layer on cpu and other layers on gpus is a way in pytorch
Thanks for your reply, what I mean is model parrellism other than data parrellism, seems put softmax layer on cpu and other layers on gpus is a way in pytorch