Problem about fine-tune semantic segmentation with Semantic Segmentation on MIT ADE20K dataset in PyTorch

I want to fine-tune semantic segmentation with datasets contain 2100 images that are collected in Habitat-sim(GitHub - facebookresearch/habitat-sim: A flexible, high-performance 3D simulator for Embodied AI research.).

I use the model HRnetv2 in Semantic Segmentation on MIT ADE20K dataset in PyTorch, and it pre-trained 30 epoch on ADE20K.

Semantic Segmentation on MIT ADE20K dataset in PyTorch separates the model into two parts, encoder and decoder.

Because I want to do transfer learning, I only load the weight of the encoder.
MIT ADE20K owns 150 categories. My dataset owns 101, so I change the num_class to 101.

After training, I do an evaluation on another dataset.
I got disastrous results.


From left to right is scene, ground truth, and predict.
The shape of the masks is approximately correct, but the color is all wrong.
Does anybody have an idea of how this problem arises and how it can be fixed? Thank you!