Validation and training loss almost constant

Hello! I am trying to build a CNN in order to extract from some images a physical quantity, which should be an array, so this is a kind of regression problem.

I have downloaded and used a pytorch version of the DenseNet (GitHub - bamos/densenet.pytorch: A PyTorch implementation of DenseNet.) modified so that I have before the dense blocks 2 conv layers (kernel sizes 5 and 3) instead of 1 and the last dense block is flattened and is followed by two fully connected layers. The 3 dense blocks contain six conv layers each with growth rates 8, 16 and 32, respectively.

I have tried different learning rates and optimization (adam, mseprop, sgd) and loss functions (mse, l1), but the training loss and the averaged validation loss does not seem to change (in some cases a decrease may be observed for the first few epochs maybe).

I have a dataset of about 10000 items and I split is to training, validation and test data (70%, 15%, 15%) using scikit-learn’s train_test_split(). The training data is shuffled when loaded by the Dataloader. A single loaded item contains a 256x256 image and the target array, which has also 256 element. The image is grayscale image and is transformed to values between 0 and 1 only by the ToTensor() transformation before entering the model.

My suspicion is that the target array should be scaled or normalized too, but I am not sure how, since some of these arrays contains only positive values , others mostly negative and some are in between. Also the values of the elements of the different arrays may differ significantly and even inside a single array the maximum value may be around 0 and the minimum lets say -200. I may also mention that the same model predicts mostly accurately a different physical quantity from the same images but of course different target arrays (element values in this case are between 0 and 1). Can someone suggest a strategy to tackle the first problem ? Many thanks and apologies for the long description.

Normalizing the output could work e.g. if your target has specific bounds.
E.g. in the past I’ve worked on a keypoint detection use case where the target coordinates were in the range [0, 96]. While the model might have been able to learn also this distribution, the training was faster by normalizing the targets to [0, 1] and “de-normalizing” it to [0, 96] to calculate the RMSE as well as the predictions. I don’t know, if you could also apply a similar workflow or of the target range is much larger and the normalization could “squeeze” small values into a tiny range.