Dataloader is slow with mps

putskan · May 11, 2023, 11:20am

For some reason, when using mps the dataloader is much slower (to a point in which its better to use cpu).

Any ideas why?

code for reproduction:

class Dataset(torch.utils.data.Dataset):
    def __init__(self, device):
        self.a = torch.tensor(1, device=device)
        
    def __len__(self):
        return 100
    
    def __getitem__(self, i):
        return self.a, self.a
    
for device in ['mps', 'cpu']:
    dataloader = torch.utils.data.DataLoader(Dataset(device), 64)
    %time next(iter(dataloader))

Thanks in advance!

marksaroufim · May 12, 2023, 1:19am

I see 2 problems

Overhead, your tensors are tiny and you don’t load all that many images so sending data to your GPU is slower than just doing the computation directly on CPU.
Your benchmark is also problematic because you’re not doing any actual computation on the GPU so just sending data to the GPU wont give you any benefits because GPUs are fast at matrix multiplication but very slow at data transfers

putskan · May 14, 2023, 7:00am

Thanks for the reply @marksaroufim !
I understand this is only a toy example which doesn’t take into account the benefits of the GPU.
However, when using it in a real training/eval process, this leads to GPU and CPU taking approximately the same time (the advantages of GPU “settle up” with the slow data loading).

Therefore I’m not able to benefit from the GPU.
I believe it’s a bit too slow. What do you think?