Having recently adapted targets for my model to follow a distribution pattern I am using KLDivLoss
as my loss function. I know that for stability the PyTorch KLDivLoss
implementation uses log-probabilities as input and hence the output of my final layer is transformed using log_softmax
.
My 2 questions are:
-
Am I correct in assuming that to convert the output of
log_softmax
back to normal softmax (i.e. a vector of probabilities representing a probability distribution) I would take the natural exponent of the output? Hence if the model output isoutput = log_softmax(dense, dim=1)
then I could donp.exp(output)
to get the vector back to the range of my y_targets?? -
Upon doing the above and training my model the output is 5 orders of magnitude smaller than the y_targets! I know the model as a whole usually works (I was using more discrete y_targets before with this model and MSE loss which worked very well). The abbreviated code is below:
Model Class:
def build_model(self):
'''
Main model build code...
'''
self.dense = nn.Linear(self.c['hidden_size'], self.c['y_distribution_size'])
self.loss = nn.KLDivLoss()
self.optimizer = optim.AdamW(self.parameters(), lr=self.c['optim_lr'])
def forward(self):
'''
Main model forward code...
'''
dense = self.dense(self.dropout(self.sigmoid(x_out)))
return log_softmax(dense, dim=1)
def fit(self):
'''
Main model fit code...
'''
self.optimizer.zero_grad()
model_out = self(x)
loss = self.loss(model_out, y)
loss.backward()
self.optimizer.step()
Y_target Tensor
Shape: (N, 5)
Where each N has a vector such as [0.2, 0.4, 0.4, 0, 0]
(sum of probabilities = 1)
Mode Output Tensor
y_pred = model(x)
where y_pred example is:
[-12.440563, -12.761744, -13.408863, -12.697731, -12.389305]
and I am converting it to y_p_pred = np.exp(y_pred)
which gives an example result of: [3.9548686e-06, 2.8684360e-06, 1.5017746e-06, 3.0580563e-06, 4.1628732e-06]
(sum of probabilities = 1.554601e-05 for that particular example)!
So I am trying to get the model to fit the resultant y_p_pred
to y_target
.