Quetion about dataloader mechanism

Hi, everyone,

I’m new to Pytorch, and I have some questions about dataloader. I’m training a ResNet-50 on imagenet challenge 2012 dataset. I use the offical dataloader here

Here is my code:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

train_dataset = datasets.ImageFolder(
        traindir,
        transforms.Compose([
            transforms.RandomSizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ]))

train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=64, shuffle=True,
        num_workers=8, pin_memory=True)

I’ve read this topic, and I understand that data is first prefetched onto CPU, augmented and put in CUDA pinned memory, and then tansfered to GPU.
I encounter 2 problems during training this ResNet-50 model.

(1) A mini-batch of 64 images takes 0.6s to train, but the computation takes only 0.3s, which means half of the time consumption is on fetching data from HDD. Is it normal?

(2) Every 8(= num_works) mini-batches it takes 1 or 2 extra second for my model to train. The time consumption during training looks like this:

mini_batch1: 1.6s mini_batch2: 0.6s mini_batch3: 0.6s mini_batch4: 0.6s
mini_batch5: 0.6s mini_batch6: 0.6s mini_batch7: 0.6s mini_batch8: 0.6s
mini_batch9: 2.5s mini_batch10: 0.6s mini_batch11: 0.6s mini_batch12: 0.6s
mini_batch13: 0.6s mini_batch14: 0.6s mini_batch15: 0.6s mini_batch16: 0.6s……

I’ve tried set num_workes to be 2 or 4, it happened same, 1 or 2 extra seconds every num_workers mini-batches. Is this normal? Why?

By the way I’m training this ResNet-50 model on a 16G RAM i7-6700k, GTX1070(8G), HDD computer.

Thank you so much!

In my experience, theVolatile GPU-Util of nvidia-smi should always be 100% or so . Maybe your hdd is too slow. Or maybe your CPU is not good enough to fetch the data.

P.S. use watch --color -n1 nvidia-smi to see GPU-util

Thank you so much for your rely. I check my gpu-util, it’s jumping between 0% and 100% which I think means I’m not feeding the data to gpu fast enough. Also, I check the cpu usage, it’s around 55% when num_workers=8. So HDD is the bottleneck of training speed? Or can I optimize it through code?

Really appreciate your help!

I guess hdd is the bottle, or your cpu isn’t powerful enogh.

P.S. you can use tmpfs to save images in memory if you have enough memory---- that would be even faster than SSD.

I’ll try that, thank you so much!