Dataloader 20-times slower than loop variant for mini batch

While developing a mnist classifier I ran into a problem with performance. I thought, I’d try out the DataLoader since I’ve seen it in a online course being used. This resulted in and unbearably large performance difference, as seen in my benchmarks.

I benchmarked Gradient Descent and Mini-batch Gradient Descent, each with the loop and the train_loader(my dataloader instance) variant.

While I have managed to figure out, that I can increase the performance and thereby decrease the time by setting the num_workers of the dataloader to 8, it still performs poorly. Minibatch with TL on GPU takes about 1min20sec, which is somewhat equal to Minibatch with Loop on CPU, which can’t possibly be the best result achievable with the dataloader. Is there anything else i forgot to set on the dataloader? Or this is simple the reality of using the dataloader?

P.S. CPU=i7 8700k GPU=Aorus 1080 Ti

There really shouldn’t be as much difference between the two, both for the time and the accuracy :thinking: .

There a mistake in your code however. When testing the accuracy on the test set, you use the model in training mode. You should instead call model.eval() when evaluating only (so that train/eval sensitive layers like batch norm are set to eval mode) and also add with torch.no_grad() in the whole test_accuracy function (so that gradients don’t get computed). When you do that, you also need to call model.train() before training.

I’m not sure how this relates to your time and accuracy problem, particularly regarding the difference between using a loop and a data loader, but it’s the correct way of doing it and it might help :slight_smile: .