Why models training is extremely slow

peppino · October 15, 2024, 11:33am

I’m experiencing strange behavior in training all my models: the training time of any model (even a relatively simple one) has increased significantly for no apparent reason. For example, to train a model like this on a fixed dataset using cuda, it previously took about 100 seconds for each epoch including backward pass, loss computation and other small computation related to the training script, but now it takes up to 30 minutes!
This slowdown also occurs when training other models that were previously very fast and I also noticed that CPU training has slowed down (could it be related?).

I originally encountered this problem with version 2.2 of pytorch and cuda 11.8, thinking it was due to the versions of these, I tried to perform a clean installation of both framworks again, but the same behavior also occurs with the versions 2.4 and 12.4.

Could anyone kindly suggest to me what it could be due to or what to check?