I’m new to Pytorch, and I have some questions about dataloader. I’m training a ResNet-50 on imagenet challenge 2012 dataset. I use the offical dataloader here
Here is my code：
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( traindir, transforms.Compose([ transforms.RandomSizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ])) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=64, shuffle=True, num_workers=8, pin_memory=True)
I’ve read this topic, and I understand that data is first prefetched onto CPU, augmented and put in CUDA pinned memory, and then tansfered to GPU.
I encounter 2 problems during training this ResNet-50 model.
(1) A mini-batch of 64 images takes 0.6s to train, but the computation takes only 0.3s, which means half of the time consumption is on fetching data from HDD. Is it normal?
(2) Every 8(= num_works) mini-batches it takes 1 or 2 extra second for my model to train. The time consumption during training looks like this:
mini_batch1: 1.6s mini_batch2: 0.6s mini_batch3: 0.6s mini_batch4: 0.6s
mini_batch5: 0.6s mini_batch6: 0.6s mini_batch7: 0.6s mini_batch8: 0.6s
mini_batch9: 2.5s mini_batch10: 0.6s mini_batch11: 0.6s mini_batch12: 0.6s
mini_batch13: 0.6s mini_batch14: 0.6s mini_batch15: 0.6s mini_batch16: 0.6s……
I’ve tried set num_workes to be 2 or 4, it happened same, 1 or 2 extra seconds every num_workers mini-batches. Is this normal? Why?
By the way I’m training this ResNet-50 model on a 16G RAM i7-6700k, GTX1070(8G), HDD computer.
Thank you so much!