Cannot overtfit on a single training sample

Hello there! For my own project, I have a dataset of RGB (3 channel) images. And the ground truth is 12 channels. In each of these channels there can be values ranging from [0-1]. And the channels are independent of each other, and not one hot encoded of course.

I was trying to train a U-Net on this data. And for experiment I took a couple of samples and trained on one and validated on the other. I was expecting to get close to perfect accuracy on the training sample since the model should overfit after a sufficient number of epochs, if I am not mistaken. However the loss starts to flatten out after a while and its not that close to ground truth either. Note that I am using MSE as the loss function. Is there anything that I might be missing? Or should I increase the complexity of the network, or use a different encoder backbone for the U-Net?

Thanks in advance, looking forward for your opinions and suggestions!

Just to make sure I understand, you are trying to predict 12 continuous valued masks from an RGB image? To me that’s a pretty intense objective, and you might just be running up against your irreducible error. I’d try the whole setup and see if the model is generalizing to validation data first. You might need a deeper U-net or wider decoding layers.

Yes that is correct. Lets say for 1 channel, each values in that channel would tell a certain property of all the pixels. More specifically if the output is 500x500x12, then the for a pixel there are 12 values that describe a 12 properties for that particular pixel.

I see, so this objective is requiring a lot out of your network. I suspect a wider decoder will eventually be useful, but I’d not be concerned with that yet. Try a full scale train/validation and see how it goes. If you pulled an off-the-shelf U-net model it should mostly work for your application.