How to speed up training on ImageNet

I try to figure out to train a classifier on ImageNet… because I want to train classifiers with different image sizes…
To do so, I am using this examples/imagenet at master · pytorch/examples · GitHub as a framework.
When I train it on the ImageNet, it takes around 16 hours per epoch on an A100, which is rather slow.

How to improve training speed besides adjusting the numbers of workers? I am also upgrading to Python 3.9…
Do I have to go through all samples from the Validation set? Can I just take subset?

The post below is all you need…