I am having serious speed issues using ImageFolder and DataLoader for feeding my model. I am loading 128x128 png image frames from the KTH dataset stored on my local HDD. Initially the training is relatively fast for a few iterations using about 50% of my CPU but then it crawls to a halt with just 5% CPU usage and very slow loading. I am not doing anything special other than the standard transformations below. I can see that my disk load is around 80% which would indicate my program is IO bound. Is there a recommended way in pytorch to preload large images, resize them and keep the result into memory to be used by the dataloader to avoid being starved by IO operations?
input_size = 60
device = "cuda"
transform = transforms.Compose([torchvision.transforms.Grayscale(), transforms.Resize(input_size), transforms.ToTensor()])
dataset = torchvision.datasets.ImageFolder('path\to\dataset', transform)
train_loader = torch.utils.data.DataLoader(dataset,
batch_size=64, shuffle=True, num_workers=4, pin_memory=True)
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)