KLDivLoss and transforming back to probabilities

jakob1427 · March 13, 2021, 7:57pm

Having recently adapted targets for my model to follow a distribution pattern I am using KLDivLoss as my loss function. I know that for stability the PyTorch KLDivLoss implementation uses log-probabilities as input and hence the output of my final layer is transformed using log_softmax.

My 2 questions are:

Am I correct in assuming that to convert the output of log_softmax back to normal softmax (i.e. a vector of probabilities representing a probability distribution) I would take the natural exponent of the output? Hence if the model output is output = log_softmax(dense, dim=1) then I could do np.exp(output) to get the vector back to the range of my y_targets??
Upon doing the above and training my model the output is 5 orders of magnitude smaller than the y_targets! I know the model as a whole usually works (I was using more discrete y_targets before with this model and MSE loss which worked very well). The abbreviated code is below:

Model Class:

def build_model(self):
  '''
  Main model build code...
  '''
  self.dense = nn.Linear(self.c['hidden_size'], self.c['y_distribution_size'])
  self.loss = nn.KLDivLoss()
  self.optimizer = optim.AdamW(self.parameters(), lr=self.c['optim_lr'])

def forward(self):
  '''
  Main model forward code...
  '''
  dense = self.dense(self.dropout(self.sigmoid(x_out)))
  return log_softmax(dense, dim=1)

def fit(self):
  '''
  Main model fit code...
  '''
  self.optimizer.zero_grad()
  model_out = self(x)
  loss = self.loss(model_out, y)
  loss.backward()
  self.optimizer.step()

Y_target Tensor
Shape: (N, 5)
Where each N has a vector such as [0.2, 0.4, 0.4, 0, 0] (sum of probabilities = 1)

Mode Output Tensor
y_pred = model(x) where y_pred example is:
[-12.440563, -12.761744, -13.408863, -12.697731, -12.389305]
and I am converting it to y_p_pred = np.exp(y_pred) which gives an example result of: [3.9548686e-06, 2.8684360e-06, 1.5017746e-06, 3.0580563e-06, 4.1628732e-06]
(sum of probabilities = 1.554601e-05 for that particular example)!

So I am trying to get the model to fit the resultant y_p_pred to y_target.