PyTorch 2 is too slow on AWS EC2 and SageMaker

Hi all,

TLDR: PyTorch 2 (including the latest 2.4 release) is much slower than PyTorch 1.13.1 on AWS EC2/SageMaker. Solutions provided on the web help, but it is still 3-5 times slower.

Details: Whether I get one of the available PyTorch 2 Ubuntu AMIs or whether I re-install PyTorch 2 using conda/pip, the training and inference is much slower compared to PyTorch 1.13.1 (same code, same instance, different PyTorch versions in different containers). I have been using AWS p3/p4/g5 instances (so probably unrelated to instance type).

I tracked it down to DataLoader. People say it is due to openmp (and CPU affinity), which is used to multi-threaded/multi-process data loading. Following the recommended solutions (e.g., installing intel-openmp instead of llvm-openmp, export KMP_AFFINITY=disabled, importing torch before numpy) seem to help, but it is still 3-5 times slower.

What is it that was changed in PyTorch 2 DataLoader that causes this huge difference in speed?

Anyone experienced this on AWS EC2/SageMaker and solved it? Or any ideas about the root cause and potential solutions?