Hello there! For my own project, I have a dataset of RGB (3 channel) images. And the ground truth is 12 channels. In each of these channels there can be values ranging from [0-1]. And the channels are independent of each other, and not one hot encoded of course.
I was trying to train a U-Net on this data. And for experiment I took a couple of samples and trained on one and validated on the other. I was expecting to get close to perfect accuracy on the training sample since the model should overfit after a sufficient number of epochs, if I am not mistaken. However the loss starts to flatten out after a while and its not that close to ground truth either. Note that I am using MSE as the loss function. Is there anything that I might be missing? Or should I increase the complexity of the network, or use a different encoder backbone for the U-Net?
Thanks in advance, looking forward for your opinions and suggestions!