Using Optimizer: Adam with loss function: MSELoss. I want to train a 5-class classifier.
My model outputs following tensor after first train sample:
[-0.1180, -0.0932, -0.9693, 0.1546, -0.5936]
which becomes the following tensor after softmax is applied:
[0.2279, 0.2337, 0.0973, 0.2994, 0.1417]
This looks perfectly fine.
However after applying optimization, the next thousands of outputs all become:
[-24.2190, 25.5244, -24.5743, -24.6079, -24.7835]
= > [2.4931e-22, 1.0000e+00, 1.7475e-22, 1.6897e-22, 1.4176e-22] which is basically [0,1,0,0,0]
[-39.9633, 40.9215, -40.6142, -40.0271, -39.2909] =>
[7.4504e-36, 1.0000e+00, 3.8857e-36, 6.9895e-36, 1.4595e-35] which is basically [0,1,0,0,0]
[-52.3863, 53.6466, -52.7560, -53.2293, -52.4467] =>
[8.9228e-47, 1.0000e+00, 6.1649e-47, 3.8403e-47, 8.3996e-47] which is basically [0,1,0,0,0]
… etc.
Thus my model never learns. What am i doing wrong? The first iteration looks fine, but then the network start outputting high numbers which doesn’t work together with softmax? Do i need to normalize data before softmax? Do i need to use log_softmax? I cant seem to crack this problem so any help is very appreciated.
For reference my loss the first iterations is:
tensor(0.2033, dtype=torch.float64, grad_fn=)
tensor(0.2752, dtype=torch.float64, grad_fn=)
tensor(0.3007, dtype=torch.float64, grad_fn=)
tensor(0.4000, dtype=torch.float64, grad_fn=)
tensor(0.4000, dtype=torch.float64, grad_fn=)
tensor(0.4000, dtype=torch.float64, grad_fn=)
tensor(0.4000, dtype=torch.float64, grad_fn=)
tensor(0.4000, dtype=torch.float64, grad_fn=)