Is there any efficient way to load MS1M dataset?

I try to train MobileFaceNet using MS1M-IBUG (85K ids/3.8M images)

But I am facing low CPU and GPU utilization. CPU keeps a low util like 30%, GPU goes to 70% for one second and keeps 0% most of the time.
Following is one of my implementations for dataloader. My another implementation is converting mxnet record to normal jpeg files beforehand and use PIL/opencv to read it, but it’s also very slow.
I tried num_worker, pin_memory, none of them significantly speed up the data loading. But if I skip the loading part and generate some random tensor, the GPU can reach 90% utilization.
So any suggestions?

import torch
import torchvision.transforms as transforms
import mxnet as mx
from mxnet import recordio
class MyDataset(
    def __init__(self, mxnet_record = 'train.rec', mxnet_idx = 'train.idx'): = recordio.MXIndexedRecordIO(mxnet_idx, mxnet_record,'r')
        self.transform = transforms.Compose([transforms.RandomHorizontalFlip(),

    def __len__(self):
        return 3804846
    def __getitem__(self, index):
        header, s = recordio.unpack(
        image = mx.image.imdecode(s).asnumpy()
        label = int(header.label)
        image = self.transform(image)
        return image, torch.tensor(label, dtype = torch.long)

This post explains some common data loading bottlenecks and proposes workarounds as well, so you might want to take a look at it. :slight_smile: