Extremely slow data-loading on first epoch

My dataloader is extremely slow on first epoch. I have tried to go through all the solutions and threads posted online and almost nothing helps. These are the things I did:

  1. Tested on SSD which also has an NVMe support. The data transfer rate is extremely fast.
  2. Turned off pin_memory, doesn’t help.
  3. Moved from reading one HDF5 file to individual hdf5 files.
  4. Removed zipping hdf5 data.

My model is very small for practical purposes and the batch size is also small. But as the amount of data increases the dataloader gets slower and slower on the first epoch. Is there any solution to this?

Dataloader link:


Train script:

What is your environment? Are you using a GPU?

Yes,
pytorch-0.4.0
cuda 9.2 Driver 396.26
8-gpus- 1080ti

Ubuntu 16.04, python3.5 (if you meant this!)

I’m curious, if you run the model on the CPU, does it have the same issue? Also, do you know which line(s) it is hanging on longest?

I think I have a temporary solution for now. I have split my dataloader into train and test. In train I don’t send the string tuples I was sending before. I only send the image and it’s associated label during training. It kind of solves the issue temporarily, I’ll have to run big tests again. I’ll post if the issue becomes an issue again.

Cheers.