Dataloader SuperSuper Slow

I prepocessed a video and selected 5 consecutive RGB frames to form tensors of shape 15 X 256 X 256 as file.pt. Here is my dataloader. But the dataloader is super slow. Any insights what could be the possible issue and how to speed it up?
I am using batch size 128 and 4 workers. On using 32 workers,I even get some exception.

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, num_workers=4, shuffle=True)

class LipSeqDataset(Dataset):
    def __init__(self, num_classes, base_path, transform, image_count, fake_classes):
        super(Dataset, self).__init__()
        self.num_classes = num_classes
        self.fake_classes = fake_classes
        self.root_dir = base_path
        self.image_count = image_count
        self.keys = get_dir_path(self.root_dir, self.num_classes, self.fake_classes, self.image_count)

    def __len__(self):
        return len(self.keys)

    def __getitem__(self, idx):

        input_images = self.keys
        # Assign label to class
        try:
            input_images = [(t, 0) if "orig" in t else (t, 1) if "df" in t else (t, 2) if "fs" in t else (t, 3) if "f2f" in t else (t, 4) for t in input_images]
            input_img_as_tensor = torch.load(input_images[idx][0])
        except:
            print(input_images[idx][0])
            exit()

        return input_img_as_tensor, input_images[idx][1]

Did you measure the data loading time during the training or just the first iteration?
Note that the first step will spin up all workers, and each will load a complete batch, which might introduce some warmup time.
Also, where is your data stored? Is it on a local SSD or some other hard drive?
Have a look at this post which gives a good summary for potential bottlenecks.