Hi all,
While training a UNet architecture model on a dataset, have encountered challenges with the model’s loss, which saturates at 0.15 after about 220800 iterations and remain constant till 1.5 million iterations.
A batch size of 10 is being used and images are being padded before training. The dataset is sufficiently large, containing hundreds of thousands of images. However, the loss saturates around 0.15 and does not improve further.
Interestingly, when zero-padding is avoided the images are used in their original, variable shapes, the loss decreases to around 0.10 after the same number of iterations.
Also experimented with different learning rates, but higher rates tend to cause the loss to explode. The learning rate that has worked best so far is 0.0001. The loss functions being used are MSE Loss and L1 Loss.
Any suggestions or insights would be greatly appreciated.