Speeding up dataloaders on google colab

Hello good people; So I have read about questions similar to this one but didn’t get any results by implementing the changes mentioned(like increasing the number of the workers-the model is fast , thus doing so won’t be necessary, or pin = true -which actually helped a little bit. But still, there is a way to go . so my dataset contains images of size 384*512, in .bmp format (3000 of them ). and here is the code for the dataset :

    def __init__(self, imagenames , refDir , ssimDir, transform=None):
        self.refDir = refDir
        self.ssimDir=ssimDir
        self.imagesName = imagenames
        self.transform = transform
        
    def __getitem__(self, index):
      x = Image.fromarray(io.imread(os.path.join(self.refDir , self.imagesName [index])))
      y = Image.fromarray(io.imread(os.path.join(self.ssimDir , self.imagesName [index])))
      if self.transform:
          x = self.transform(x)
          y = self.transform(y)
      return x, y
    def __len__(self):
        return len(self.imagesName)
transform=transforms.Compose([
                            transforms.ToTensor(),
                            transforms.Normalize((0.5,0.5,0.5), (1.0,1.0,1.0))
                        ])
disortedDir = r'/content/gdrive/MyDrive/tid2013/distorted_images'
ssimMapsDir = r'/content/gdrive/MyDrive/tid2013/ssimMaps'
training_data = MyDataset(X_train , disortedDir , ssimMapsDir , transform)

And here is the code for dataloader :

training_loader = DataLoader(training_data, 
                             batch_size=batch_size, 
                             shuffle=True,
                             pin_memory=True)```
batchize = 32 when training the model and  here is how I train it : 
```for i in xrange(num_training_updates):
    (data, ssim) = next(iter(training_loader))
    data = data.to(device)
    ssim = ssim.to(device)
    optimizer.zero_grad()

    vq_loss, data_recon, perplexity = model(data)
    recon_error = F.mse_loss(data_recon, ssim) / data_variance
    loss = recon_error + vq_loss
    loss.backward()

    optimizer.step()  # based on the code from  Aäron van den Oord on google colab _ pytorch

In the original code when they used pytorch MNIST dataset as follows:

#                                   transform=transforms.Compose([
#                                       transforms.ToTensor(),
#                                       transforms.Normalize((0.5,0.5,0.5), (1.0,1.0,1.0))
#                                   ]))

It takes about seconds but when I use my own costume dataset it takes several minutes. Most of the time when training is spent on fetching the dataset. How can I improve it ?? Is this even normal ( sinceimages of my dataset are much bigger in size )?