Why Semantic Segmentation Results are blurry/fuzzy. Or can you use semantic segmentation to predict 3 channels rgb from an image?

Hi all,
I am working on semantic segmentation using the UNET architecture. I initially started off by trying to predict the 3 RGB channels from the target. I know that semantic segmentation expects a class to have binary values but gave it a shot anyways. The results I obtain are ok as shown but also blurry/fuzzy and I would like to find the reason or source of this error and also to improve it. Is there a way to improve this for prediction of RGB channels without having to split the image into multiple colors?

Note that the top picture shows the target, or label that we want to predict while the bottom shows the prediction output from UNET architecture