First batch of Imagenet training is slow with sequential loading

Hello, I’m trying to perform a training from scratch on ImageNet with VGG16. Since the loader in the example is quite slow even after resizing the images and I can’t put them on an SSD, I tried to build a sequential data loader by modifying this code:

My machine is a NC24 instance on Azure (4 x K80 GPU). I’m using Python 3.6, PyTorch 0.3.0 with CUDA 8.0 and cuDNN 7.0.

Some weird things happen:
1 - the first batch takes a lot more time than the others. I’m talking about 2-3 minutes for the first batch vs 3 seconds for the following.
2 - 3 seconds is an improvement over the 7-8 seconds for the old dataloader, but BayesWatch readme talks about 0.5s for one batch.

Do you have any idea why nr. 1 happens? I can share the code if it helps (need to make it a bit more readable before)

1 Like

Did you ever figure this out? I am having a similar issue.