Why is DataLoader slower when pin_memory = True?


I have been trying to optimize the way that I am looking my image data, and I have considered using pin_memory =True, however, when I profiled my code (using torch.utils.bottleneck), I noticed something unusual.

(All experiments I conducted were for 1 epoch using an 8GB image dataset (LRS2))
When pin_memory =True, most of the time is spent on “method ‘acquire’ of ‘_thread.lock’ objects”. This achieved a cumtime of 1934 seconds

When pin_memory = False, a cumtime of 1127 seconds is achieved.
I find this quite unusual as all my models are on the GPU and when reading severl forum posts, it seems that pin_memory=True when working on the GPU is the way to go.

Any insight into this matter would be highly appreciated.

Your system might suffer, if you lock too much memory for the data (and thus other processes won’t be able to use this memory and their performance might decrease). For this reason it’s also not used by default.

Great! Thank you for the clarification.