Semantic Segmentation: U-net overfits on Pascal VOC 2012

What i am suspecting is that the data augmentation used is augmenting the source images without applying the same augmentation to its corresponding mask / label.
I would suggest training without the random data augmentation while recording the evolution of the loss function value across consecutive iterations