Overfitting and hanging

my classification program sometimes hangs at random epoch (the program doesn’t advance anymore) and tends to overfit. How to fix these two problems.
This is the program : Dropbox - train.py - Simplify your life
Thank you for helping me
Best regards

Noone has an answer please ?

huh ? nobody please ?

Your description and code snippet isn’t isolating the issue sufficiently as the code is not even executable.
I don’t see any obvious issues in the code so you would need to narrow down where exactly the code is hanging and try to isolate it further to a minimal, executable code snippet.

ok but could you explain to me how to avoid overfitting ? the hanging is coming randomly with other codes, i suspect overheating of GPU

Your model would overfit to the training dataset if it’s able to learn it properly while the validation dataset shows a gap in its loss or accuracy. This would mean that your model learns individual features of the training dataset samples instead of features which would generalize to the classes.
This is often observed if the dataset is too small compared to the “capacity” of the model (i.e. it’s number of trainable parameters). I’m sure the literature can explain it better and in more detail.
You would usually try to increase the dataset, add a more aggressive data augmentation strategy, or reduce the model capacity either directly by using a smaller model, by adding e.g. dropout during training, or by fine-tuning a frozen and pretrained model.
We have a lot of topics in this discussion board where users are trying to avoid overfitting so reading through some of these might also help and give you some ideas to tackle this issue.

Ok thank you I have about 1000 images, is a resnet50 would be better ? And how to make dropout please ? Thank you Best regards

Envoyé de mon iPhone

I don’t know and you would need to run some experiments.

You can add it via a module as nn.Dropout to the model directly or via its functional API:

x = F.dropout(x, training=self.training)

in the model’s forward method.

Could you please give me a piece of code including the dropout ?

Sure! Assume you want to add another linear layer as well as a dropout layer to renset50’s classifier (which is a single linear layer assigned to model.fc), you could replace it with a new nn.Sequential container and add the new layers as well as the original linear layer to it as seen here:

model = models.resnet50()

nb_features = model.fc.in_features
model.fc = nn.Sequential(
    nn.Linear(nb_features , nb_features ),

x = torch.randn(2, 3, 224, 224)
out = model(x)