Dataloader become slow after a while

marry_Seed · March 17, 2018, 6:26pm

HI,

I use dataloader to do inference. The transform is just centercrop, normalization and ToTensor. The speed for at the beginning is about half second per epoch

Test: [20/19532]	Time 0.567 (2.62705732527)	Prec@1 [82.8125] ([82.92411]))
Test: [30/19532]	Time 0.255 (1.90457838581)	Prec@1 [84.375] ([82.7495]))
Test: [40/19532]	Time 0.265 (1.54226525237)	Prec@1 [87.109375] ([83.0221]))
Test: [50/19532]	Time 0.272 (1.31763061823)	Prec@1 [80.859375] ([83.17249]))
Test: [60/19532]	Time 0.280 (1.16401662592)	Prec@1 [82.421875] ([83.38242]))
Test: [70/19532]	Time 0.349 (1.05755999055)	Prec@1 [81.25] ([83.428696]))
Test: [80/19532]	Time 0.492 (0.974306159549)	Prec@1 [86.71875] ([83.55999]))

But the speed has become pretty slow after a while. Here is the screen shot

Test: [9870/19532]	Time 8.239 (4.82513771539)	Prec@1 [98.828125] ([95.81966]))
Test: [9880/19532]	Time 0.291 (4.82656297883)	Prec@1 [98.4375] ([95.82219]))
Test: [9890/19532]	Time 4.884 (4.82562276992)	Prec@1 [99.609375] ([95.82543]))
Test: [9900/19532]	Time 7.214 (4.82822921033)	Prec@1 [97.265625] ([95.82779]))
Test: [9910/19532]	Time 9.636 (4.829314595)	Prec@1 [98.828125] ([95.83042]))
Test: [9920/19532]	Time 0.228 (4.82823389156)	Prec@1 [98.046875] ([95.833534]))
Test: [9930/19532]	Time 15.800 (4.83102715987)	Prec@1 [98.828125] ([95.83614]))

marry_Seed · March 17, 2018, 10:56pm

Anyone has met the similar problem? Or did I use pytorch wrong? I run imagenet sample code from pytorch example. I need to decide whether I should use pytorch or other framework. Thanks

SimonW · March 18, 2018, 3:20am

What is the Time number supposed to mean? Why are you thinking is data loading’s problem? What does the script look like and how did you run it?

marry_Seed · March 18, 2018, 4:25am

Time should be second. I use pytorch imagenet training example and only use the code for evaluation.

SimonW · March 18, 2018, 6:34pm

Second of what? Did you modify the script? Still, what made you think that it is the dataloader’s problem?

marry_Seed · March 18, 2018, 6:49pm

The unit of Time is second. I modified the code to return image names. It is pure evaluation code and image size are same, transform operations are same from batch to batch. There is shuffle, not sampler. If the problem is not caused by dataloader, what is other possible reason to cause this?

KaiyangZhou · March 18, 2018, 7:15pm

Did you meed this problem everytime? I would guess this issue is caused by the heavy I/O operations on your machines (say many threads are in progress and the cores are shared)

Tony_Lee · September 27, 2018, 7:56am

Hi, I met the same problem, have you solved this one?

Lizhen_Ji · November 3, 2021, 7:52am

Hi, same problems here, I have 1200000 images has the same problem here, any help would be appreciated, thanks