I am building a network with an LSTM encoder for sentence embedding and a two layers MLP as a classifier with a Softmax function. The loss is CrossEntropy. My training loss seems to decrease, while the validation accuracy stayed the same. I printed out the classifier output and realized all samples produced the same weights for 5 classes. I have tried tuning the learning rate and changing the optimizer but none of them work. Could anyone tell me what might happen and what I should check next?
My LSTM encoder works fine for other problems, so I don’t think there is a bug in the architecture.
TRAIN EPOCH 0 LOSS 1.6094570651566242 TIME 3.570338030656179
TRAIN EPOCH 1 LOSS 1.6094265852944325 TIME 3.553918214639028
TRAIN EPOCH 2 LOSS 1.6094126903404624 TIME 3.553771432240804
TRAIN EPOCH 3 LOSS 1.6094204059428414 TIME 3.558300856749217
TRAIN EPOCH 4 LOSS 1.6094172122114796 TIME 3.561331601937612
VALIDATING EPOCH 4 ACC 0.19394736842105262
Update save model
TRAIN EPOCH 5 LOSS 1.609414727000867 TIME 3.5560521999994914
TRAIN EPOCH 6 LOSS 1.609416576428602 TIME 3.553495530287425
VALIDATING EPOCH 6 ACC 0.19394736842105262
TRAIN EPOCH 7 LOSS 1.6094154085816637 TIME 3.603946268558502
TRAIN EPOCH 8 LOSS 1.6094178372183763 TIME 3.600656755765279
VALIDATING EPOCH 8 ACC 0.19394736842105262
TRAIN EPOCH 9 LOSS 1.6094167562527846 TIME 3.5670180916786194
TRAIN EPOCH 10 LOSS 1.6094169845688815 TIME 3.558741509914398
VALIDATING EPOCH 10 ACC 0.19394736842105262
TRAIN EPOCH 11 LOSS 1.6094162363117024 TIME 3.572726861635844
TRAIN EPOCH 12 LOSS 1.609419370775169 TIME 3.5575267990430195
VALIDATING EPOCH 12 ACC 0.19394736842105262
TRAIN EPOCH 13 LOSS 1.6094190575982217 TIME 3.5639585892359418
TRAIN EPOCH 14 LOSS 1.6094135887878762 TIME 3.5603323658307393
VALIDATING EPOCH 14 ACC 0.19394736842105262
TRAIN EPOCH 15 LOSS 1.6094177766034832 TIME 3.560251947244008
TRAIN EPOCH 16 LOSS 1.6094157561070501 TIME 3.5713677604993186
VALIDATING EPOCH 16 ACC 0.19394736842105262
TRAIN EPOCH 17 LOSS 1.6094183625474487 TIME 3.5744518558184306
TRAIN EPOCH 18 LOSS 1.6094157695770264 TIME 3.56392617225647
VALIDATING EPOCH 18 ACC 0.19394736842105262
TRAIN EPOCH 19 LOSS 1.6094196570121635 TIME 3.549365504582723
>>> net.out
tensor([[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003],
[0.1995, 0.1999, 0.1995, 0.2009, 0.2003]], device='cuda:0',
grad_fn=<SoftmaxBackward>)