NN architecture and L1Loss (not decreasing), MSE (smoothing)

I have pairs of images like below (inputs and labels), they are both single channel images with real-valued pixels. I want to train a CNN that maps from one to the other. I have tried UNet and variations of UNet, and it somehow works okay but I get very smoothed outputs (as you can see the target has discrete individual pixels). I have tried using MSE loss and when I switch to L1Loss the loss doesn’t decrease. Also I’ve explored other options such as (SSIM, perceptual loss etc.) but none worked.

Maybe I should explore different architecture.

My questions now are:

  • What kind of architecture would be suitable for this “simple” 1-to-1 pixel-wise mapping problem?
  • What loss function to use?