Hello! I am trying to train a DenseNet, which I defined here
The cost functions seem to converge for both train and dev sets as shown in this snapshot (cost function for dev set is not normalized):
However, when I plot together the actual label (which is a vector of 128 numbers) and the predicted label, the result looks like this:
Would somebody have had a similar experience and/or have some ideas what to check and what to try?