Speed up datasets loading on google Colab

Hello guys,

I am working on Image classification on the German Traffic Sign Dataset on Google Colab with Pytorch.
Here is the structure of the dataset:

  • GTSRB
    • Training
      • 00000/
        • *.ppmm
      • 00043/
        • *.ppmm
    • Test
      • *.ppmm
      • labels.csv

I have managed to upload the whole dataset to my drive(it took a long time!!!).
I have used ImageFolder class and Dataset class to load respectively training and test images.

However, Training my model is really slow, and GPU is not used efficiently. After many searches, I discovered that file transfer from drive to Colab is at fault here.

Does anyone know how I can use hd5 dataset (or others techniques) to first store all training and test images for later preprocessing?