Torch 2.0 a bit slower without compile?

Hi there!

I just wanted to ask, is it normal, that torch 2.0 is slightly slower than torch 1.11 if I don’t use torch.compile? I am getting an average of 45sec/epoch vs 40sec/epoch. I am using Resnet18 from timm, doing some metric learning training.

I am using K80 and CUDA 11.1, but I also tried CUDA 11.7.

torch.compile won’t work on K80 unfortunately ;(