DataLoader/ImageFolder slow with very low CPU usage

I am having serious speed issues using ImageFolder and DataLoader for feeding my model. I am loading 128x128 png image frames from the KTH dataset stored on my local HDD. Initially the training is relatively fast for a few iterations using about 50% of my CPU but then it crawls to a halt with just 5% CPU usage and very slow loading. I am not doing anything special other than the standard transformations below. I can see that my disk load is around 80% which would indicate my program is IO bound. Is there a recommended way in pytorch to preload large images, resize them and keep the result into memory to be used by the dataloader to avoid being starved by IO operations?

input_size = 60
device = "cuda"
transform = transforms.Compose([torchvision.transforms.Grayscale(), transforms.Resize(input_size), transforms.ToTensor()])
dataset = torchvision.datasets.ImageFolder('path\to\dataset', transform)
train_loader = torch.utils.data.DataLoader(dataset,
             batch_size=64, shuffle=True, num_workers=4, pin_memory=True)

for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

If your data is stored an a HDD, I doubt you can do a lot to avoid the IO bottleneck besides maybe reducing the batch size. Based on your explanation it seems that your code is IO bound so even pre-calculating the preprocessing would not change anything.
An SSD would provide a speedup, if that’s an option.

Also, have a look at this thread about a similar issue. Maybe you could try some suggestions like using an HDF5 file etc.