Train loss decreases very slowly with BCELoss()


I am trying to train a multi-label image classification model with an SGD optimizer, and my learning rate 0.0005. As you can see from the train/val loss figure below, the model’s losses are decreasing, but precision and accuracy scores are really bad, between 0.3-0.10.

To tackle the issue, I changed with different models (deeper and shallower) and various learning rates, and now train and val losses decrease very slowly. What is your opinion, or what should I do about it?

Thank you very much in advance,


Hi Goksu,

Making the network deeper and learning rate smaller increases the training time and causes the model to make very small gradient updates which the model can stuck at a local minima. Making the network shallower and increasing the learning rate typically shortens the training time and your model makes larger gradient updates which it causes the model to have a general knowledge about the domain you are training with your model. You should find a good balance between shallow and deeper network. Accordingly, you may want to use Adam or AdamW as it is less sensitive to learning rate parameter.
You can also incorporate skip connections as it diminish the vanishing gradient problem and I use this strategy a lot in fact. It helped me a lot. You may want to try new activation functions such as Mish etc. They demonstrated really good results compared to other functions e.g. ReLu.

Thank you very much for your help; I will try those technics.

All the best,

1 Like