Loading datasets in Pytroch using Google Colab

SurajSubramanian · January 20, 2020, 7:03am

Loading datasets in Google Colab takes more time than being done locally in Jupyter Notebook


|-- datasets
|   |-- train_folder
|   |   |-- 00
|   |   |    |-- 0
|   |   |    |   |-- file2161.jpg
|   |   |    |   |-- file2162.jpg
|   |   |    |   |-- file2163.jpg
|   |   |    |   |-- file2164.jpg
|   |   |      
|   |-- test_folder
|   |   |-- 01
|   |   |    |-- 1
|   |   |    |   |-- file1161.jpg
|   |   |    |   |-- file1162.jpg
|   |   |    |   |-- file1163.jpg
|   |   |    |   |-- file1164.jpg
|   |   |

This is the organization of my dataset.

class SmileDataset(Dataset):
    def __init__(self, data_root):
        self.samples = ImageFolder(data_root)

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx1):
        return self.samples[idx1]

ds_rude = SmileDataset('../mindnotix/smile-detection-master/datasets/train_folder/00')
ds_smile = SmileDataset('../mindnotix/smile-detection-master/datasets/train_folder/01')
trans1 = transforms.ToTensor()
ds_rude  = [trans1(img) for img,l in ds_rude]
ds_smile = [trans1(img) for img,l in ds_smile]
ds_rude, ds_smile = torch.stack(ds_rude), torch.stack(ds_smile)

I used the above code to load the dataset using Pytorch.
When I run the code locally in Jupyter Notebook, it runs within a few seconds since there are only 2000 images in both the train and test folders.
But when I run it in Google Colab, it takes several minutes.

Is there anyway to speed up the data-loading in Google Colab.
I did change the runtime to GPU, but that doesn’t speed it up.

Thanks in Advance !