Improve UNet Recognition

Hi! I am trying to implement a UNet for aerial images. I am doing a semantic segmentation and if my network has to distinguish between water(blue) and ground(green) it works fine, but when I try to recognize ground(green), forest(a darker green) and houses(grey), the network does not learn.
Because the images are large, I split them into 128x128 images and I selected the images that contain the classes that I am interested in. If I want 3 classes, I select the photos that have:

  1. 20% of the photo with class “agriculture”

  2. 20% of the photo with class “urban”

  3. 5% of the photo unknown

I added multiple images, but the loss is not getting better. How can I try to improve it. Many thanks!
Data set: DeepGlobe Land Cover Classification Dataset | Kaggle