Issues with Training UNet Architecture Model

Hi all,

While training a UNet architecture model on a dataset, have encountered challenges with the model’s loss, which saturates at 0.15 after about 220800 iterations and remain constant till 1.5 million iterations.

A batch size of 10 is being used and images are being padded before training. The dataset is sufficiently large, containing hundreds of thousands of images. However, the loss saturates around 0.15 and does not improve further.

Interestingly, when zero-padding is avoided the images are used in their original, variable shapes, the loss decreases to around 0.10 after the same number of iterations.

Also experimented with different learning rates, but higher rates tend to cause the loss to explode. The learning rate that has worked best so far is 0.0001. The loss functions being used are MSE Loss and L1 Loss.

Any suggestions or insights would be greatly appreciated.

The issue you’re facing is likely due to the way zero padding affects the learning process of your U-Net model. By modifying how the loss is computed and how padding is handled, you can mitigate these effects. Try to implement masking in your loss function is a practical first step that directly addresses the problem of the model learning from irrelevant padded regions.

Dice + CE is generally a better loss to be used for segmentation (assuming) tasks. GitHub - JunMa11/SegLossOdyssey: A collection of loss functions for medical image segmentation

You could also use this library to train – it generally does well as a baseline. GitHub - MIC-DKFZ/nnUNet