nn.Model, what is best practice (or what is commonly used) between outputting the logits or the probabilities?
Consider these two simple cases:
1. the model outputs the logits:
class Network(nn.Model): def forward(self, x): ... logits = self.last_layer(x) return logits # Training data, target = ... net = Network() logits = net(data) loss = nn.functional.some_loss_with_logits(logits, target) ... # Predicting data = ... logits = net(data) probabilities = nn.functional.some_squashing_function(logits)
- the training code is clean
- using a loss function with logits is straightforward
- during inference, one has to remember that the network output are logits and not probabilities and apply whatever sigmoid/softmax/other is required
2. the model outputs the probabilities:
class Network(nn.Model): def logits(self, x): ... logits = self.last_layer(x) return logits def forward(self, x): logits = self.logits(x) return nn.functional.some_squashing_function(logits) # Training data, target = ... net = Network() logits = net.logits(data) loss = nn.functional.some_loss_with_logits(logits, target) ... # Predicting data = ... probabilities = net(data)
- making a prediction now looks more pytorch-like
- during training, one has to use the custom method
- this doesn’t play well with other things like wrapping the model with
nn.DataParallelbecause then the
logitsmethod is not exposed
A third option would be to turn on and off the squashing layer based on the mode of the network, i.e. net.train() and net.eval(). However this would hide the network behavior even more.