Unable to reproduce training performance

I have trained a RegNet model on a custom dataset for an image classification task. That was in August 2023. Now I want to train exactly the same model again, using the same dataset. I would expect this new model to achieve about the same performance as the previous one from August 2023, since nothing has changed:

  • I use exactly the same PyTorch and Torchvision versions (1.13 and 0.14)
  • I use exactly the same image dataset for training/validation/test
  • I use exactly the same script to train the model via torch
  • And I use exactly the same training hyperparams as before

However, even though nothing has changed, the newly trained model performs significantly worse the the original model from last year. Where the first model from august 2023 achieves an test accuracy of 0.97, now the new model only achives 0.94 on the very same test dataset. During training the train and validation accuracy is about the same though as before.

I understand that two models will not achieve exactly the same performance, but three % difference seems too much. Whatever I do, I cannot get close to these 0.97 test accuracy from last year, about 0.94 is all I get. Even though everything is exactly the same, as described. Even the machine with its four GPUs and the Ubuntu version running on that machine are exactly the same as before in 2023.

I know there is a random seed involved, but I doubt that could lead to such a large test accuracy difference of 3%. Also I know that maybe Nvidia / CUDA driver has been updated on that machine, and of course some dependencies and packages (e.g. numpy). But can that lead to such a huge difference?

Did you check it using your setup and different seeds? Also, was the previous run from last year a single run or were you able to reproduce it back then?

Thanks for your reply!

Ad seeds - I trained over 100 models during the last days, and none of them achieved that performance of last year. Since every such trial would be based on a random seed anyway, I conclude that the seed does not lead to large changes of multiple % acccuracy.

Ad reprodudcability - Unfortunately I don’t know. Last year I trained only once, was quite happy with the results, so there was no need to train another model. Thus I cannot say whether that was a super “lucky punch” back then, or the setup was somehow different.