Training DeepLabV3+ on Pascal Voc 2012 dataset with pytorch

I’m trying to train the DeepLabV3+ architecture with ResNet101 as the backbone on Pascal Voc 2012 semantic segmentation dataset. I’m using the pretrained weights on imagenet and i freeze the weights of the backbone in training. I also perform some transformations on the training data such as random flip and random rotate. The problem is that the model mIoU metric is very low in both training and validation set,approximately 40% and does not improve more. It seems like underfitting. Besides this, when i’am not using data augmention the model overfits in a large scale. Does anyone has any suggestions about how I can reproduce good results on Pascal Voc dataset with the deeplabv3+ architecture? It will help so much. Here to note that I’m referring to the voc dataset with 1464 train and 1449 validation images, not the augmented dataset.

Thank you a lot.