Dataloader slows down when training with mac MPS

@rtwolfe94022 It turns out that the dataloader’s speed is fine. Most of the time is from _loss.cpu().detach().numpy() which synchronize the GPU. And my timing code wrapped this procedure’s time in dataloader_time. In my case, make batch size smaller can relieve the problem.

ref: Calling loss.item() is very slow

1 Like