This is a simple but general problem. Recently I was training a deep learning model using Pytorch, but it seems that no matter how I modified the layer number, channels or layer output size, the training loss stop improving after hundreds of iterations. Changing the optimizing algorithm did not help much. I wonder is there an automatic way which can help me analyze which layer is the bottleneck that affects the decrease of training loss or model parameter updating?
Thanks for your help!
The answer to your question is very complex. You are assuming that there is a layer which is the bottleneck of your model, but in fact many different factors can affect your training. here are just a few examples:
- Data - your dataset may hide a limit on the success of your model. if your dataset contains errors in labeling for example, or if it is very difficult your network will never be able to reach accuracy over the errors in the dataset.
- Weight decay - if your optimizer is using weight decay, you may have set it too high. That means that you are limiting you model from learning the data better. Decreasing that wd could of course, cause overfitting, but that is another issue.
- Learning rate - when you have reached the plato in loss, you may want do decrease you learning rate in order to allow your model to decent into lower minima. Consider using a learning rate scheduler and decrease the lr when you reach the plato.
many other factors could affect the model training. if you are using a known model and dataset, look for the training recipe for this model, it will give you a good start.
@Ofri_Masad That’s very insightful. Is there a module in Pytorch that I can use it to monitor the weight decay in each layer?
PS: just out of curious, I might make the problem too complicated.