Techniques for multiclass semantic segmentation with large number of output classes

I’m working on a custom dataset which has over 100 classes (10K images), trying to train the models using the deeplabv3plus and unetplusplus (also tried the base versions). The metrics I’m using are dice loss and iou score. It seems to be converging at a very early epochs and don’t see any improvements to loss or iou post that.

What are some of the best practices or techniques to improve the models iou when working large number of output classes? Choosing the model, focusing on particular hyperparameters?