for example, I am training a face recognition model with millions ids, beside use tripletloss, I would like to use softmax-based losses such as arcloss, amsoftmax and so on. However, with such huge classes, gpu meomery will be insufficient, is there a way that I can train a model like this? Maybe split the softmax layer on multi gpus would be work, I wonder whether pytorch support this
Pytorch does have a support for multiple GPUs, also look into something called as Probabilistic Classification. This technique is mostly used in NLP to predict the upcoming word.So it falls under huge class classification problem.
And you can look here for paralell gpu processing in pytorch
I hope this helps!
Thanks for your reply, what I mean is model parrellism other than data parrellism, seems put softmax layer on cpu and other layers on gpus is a way in pytorch
yes it is possible refer to the topic below
That would be nice, by the way, is it possible split a single layer such as a large fc layer to multi devices?
You can create sub modules of the model and train them in multi devices
class MyModel(nn.Module): def __init__(self, split_gpus): self.large_submodule1 = ... self.large_submodule2 = ... self.split_gpus = split_gpus if split_gpus: self.large_submodule1.cuda(0) self.large_submodule1.cuda(1) def forward(self, x): x = self.large_submodule1(x) if split_gpus: x = x.cuda(1) # P2P GPU transfer return self.large_submodule2(x)