I have a 27-class classification problem with a time series of 1600 samples as the input data. I have a model made up of simple feedforward network with 4 hidden layers that has very good training loss (0.14). On the test data, it has very high accuracy (98%), but a a very high loss (2.45). It appears that the model makes very few mistakes, but those mistakes are very severe.
This happens regardless of the optimiser, the number of epochs and different initialisations. I also observe similar behavior with a 1-D convolutional model.
Is there a way to correct for this behavior and reduce the loss?