I coded my loss function following https://arxiv.org/pdf/1907.08956v1.pdf, but the training loss is extremely negative. Specifically, I’m using the negative of the log-likelihood function with MSE being my reconstruction loss. Why might this be?

My code for the loss function is:

```
MSELoss_criterion = nn.MSELoss()
MSE_loss = MSELoss_criterion(y_hat, tgts)
KLDiv_loss = -0.5*torch.sum(1+log_var_q - mu_q **2 - log_var_q.exp(), dim=(2))
KLDiv_loss = torch.mean(KLDiv_loss)
return -MSE_loss + KLDiv_Loss
```