Bottleneck on data loading

marioo · July 24, 2020, 5:27pm

Hi,

I have a bottleneck on the dataloading during training. I run cProfiler and these are the results:

        1    0.012    0.012 1820.534 1820.534 models.py:15(fit)
       56    0.001    0.000 1808.163   32.289 dataloader.py:775(__next__)
       52    0.001    0.000 1807.264   34.755 dataloader.py:742(_get_data)
      392    0.016    0.000 1807.263    4.610 dataloader.py:711(_try_get_data)
      392    0.006    0.000 1807.178    4.610 queues.py:91(get)
      392    0.002    0.000 1806.842    4.609 connection.py:253(poll)
      392    0.002    0.000 1806.840    4.609 connection.py:413(_poll)
      392    0.009    0.000 1806.837    4.609 connection.py:906(wait)
      392    0.004    0.000 1806.810    4.609 selectors.py:402(select)
      392 1806.805    4.609 1806.805    4.609 {method 'poll' of 'select.poll' objects}
        4    0.000    0.000    6.452    1.613 dataloader.py:274(__iter__)
        4    0.016    0.004    6.452    1.613 dataloader.py:635(__init__)
      128    0.007    0.000    5.553    0.043 process.py:101(start)
      128    0.001    0.000    5.531    0.043 context.py:221(_Popen)
      128    0.003    0.000    5.530    0.043 context.py:274(_Popen)

I am using 32 workers (I have 40 cpus available).
Do you what is causing the dataloading to be slow? Do you know what are the files queues.py and connetctions.py? Functions there seem to be taking great part of the time.

Cheers

ptrblck · July 26, 2020, 8:54am

Have a look at this post, which explains some potential bottlenecks and workarounds.

Ara_Jafarzadeh · February 8, 2022, 10:25am

Both connections.py and queues.py seem to be from the multiprocessing library of Python.