Choice of loss function

Hello there, I am training a network (ConvNet) which tries to regress values sized of 500x500x13 from whereas it has been trained on 3 channel RGB images (500x500x3). I have started using the vanilla unet and made a few small changes in it. I have a dataset of around 9000 images. By the way the values in those 12 channels are independent of each other. All those value range between 0-1. Each channel represent a certain property. (Like red channel shows red values throughout the channel, similarly each of those 12 channels on the output carry significance on there own).

However after starting the training, straight from epoch 1 the loss seems to go down. However the validation loss seems to go up for first 3/4 epochs then go down for 5/6 epochs more (go down slightly), then it flattens, maybe goes up or stays flat.

During this I was using sum of squared errors which would give a loss of lets say 100,000 or around that at the beginning of training since it is essentially summing up all the errors per pixel on each channel. Should I instead use regular MSE? The reason I did not do it is I get very small loss values from start if I do the mean (values like 0.08, 0.5 or as such).

Is there anything that I am doing very very wrong? Am I missing something very stupid here? Is my choice of loss very wrong? And why would I get a upward rising loss from epoch 1 to around epoch 4?

Your suggestions would be greatly appreciate and will make sure that I am at least on the right ballpark. Thanks!

How are you splitting your data into training and validation sets? My hunch is that your data isn’t shuffled before splitting, which can result in your training set not being representative of the dataset as a whole.

I am using random_split to get the train and validation set. Its a 90/10 split.

I would try the following two things as initial trouble shooting:

  • Can you try initializing with a different random seed? This will rule out that you did not just get an unlucky split of your data.
  • Can you try using a larger validation set? I would try a 70/30 split, just to ensure that the distribution of the val set is closer to that of the training set.

Also, what optimizer are you using? There is nothing obviously wrong with your choice of loss, but I find that using MSE with sum or average reduction depends on if I am using SGD or something like Adam.