Hi!
I have been trying to optimize the way that I am looking my image data, and I have considered using pin_memory =True, however, when I profiled my code (using torch.utils.bottleneck), I noticed something unusual.
(All experiments I conducted were for 1 epoch using an 8GB image dataset (LRS2))
When pin_memory =True, most of the time is spent on “method ‘acquire’ of ‘_thread.lock’ objects”. This achieved a cumtime of 1934 seconds
When pin_memory = False, a cumtime of 1127 seconds is achieved.
I find this quite unusual as all my models are on the GPU and when reading severl forum posts, it seems that pin_memory=True when working on the GPU is the way to go.
Any insight into this matter would be highly appreciated.