Dataloader become slow after a while


I use dataloader to do inference. The transform is just centercrop, normalization and ToTensor. The speed for at the beginning is about half second per epoch

Test: [20/19532] Time 0.567 (2.62705732527) Prec@1 [82.8125] ([82.92411]))
Test: [30/19532] Time 0.255 (1.90457838581) Prec@1 [84.375] ([82.7495]))
Test: [40/19532] Time 0.265 (1.54226525237) Prec@1 [87.109375] ([83.0221]))
Test: [50/19532] Time 0.272 (1.31763061823) Prec@1 [80.859375] ([83.17249]))
Test: [60/19532] Time 0.280 (1.16401662592) Prec@1 [82.421875] ([83.38242]))
Test: [70/19532] Time 0.349 (1.05755999055) Prec@1 [81.25] ([83.428696]))
Test: [80/19532] Time 0.492 (0.974306159549) Prec@1 [86.71875] ([83.55999]))

But the speed has become pretty slow after a while. Here is the screen shot

Test: [9870/19532] Time 8.239 (4.82513771539) Prec@1 [98.828125] ([95.81966]))
Test: [9880/19532] Time 0.291 (4.82656297883) Prec@1 [98.4375] ([95.82219]))
Test: [9890/19532] Time 4.884 (4.82562276992) Prec@1 [99.609375] ([95.82543]))
Test: [9900/19532] Time 7.214 (4.82822921033) Prec@1 [97.265625] ([95.82779]))
Test: [9910/19532] Time 9.636 (4.829314595) Prec@1 [98.828125] ([95.83042]))
Test: [9920/19532] Time 0.228 (4.82823389156) Prec@1 [98.046875] ([95.833534]))
Test: [9930/19532] Time 15.800 (4.83102715987) Prec@1 [98.828125] ([95.83614]))

Anyone has met the similar problem? Or did I use pytorch wrong? I run imagenet sample code from pytorch example. I need to decide whether I should use pytorch or other framework. Thanks

What is the Time number supposed to mean? Why are you thinking is data loading’s problem? What does the script look like and how did you run it?

Time should be second. I use pytorch imagenet training example and only use the code for evaluation.

Second of what? Did you modify the script? Still, what made you think that it is the dataloader’s problem?

The unit of Time is second. I modified the code to return image names. It is pure evaluation code and image size are same, transform operations are same from batch to batch. There is shuffle, not sampler. If the problem is not caused by dataloader, what is other possible reason to cause this?

Did you meed this problem everytime? I would guess this issue is caused by the heavy I/O operations on your machines (say many threads are in progress and the cores are shared):sweat_smile:

Hi, I met the same problem, have you solved this one?

Hi, same problems here, I have 1200000 images has the same problem here, any help would be appreciated, thanks