When I was using the cross-entropy loss, it was even more fluctuating. This is why Iam using the Lovasz loss, which is taking the IoU (L = 1 - IoUc). [Lovasz Softmax Paper] [Lovasz Overview Slide]
Should I also balance the classes for the Lovasz loss function ?