Then comes the question: why by default pin_memory is False in DataLoader? I tried to recall the minimal knowledge learned from operation system classes. Does pin-memory indicate that once a batch is pinned, it will always stay in the memory until the process ends?
pinned memory is page-locked memory. It is easy for users to shoot themselves in the foot if they enable page-locked memory for everything, because it cant be pre-empted. That is why we did not make it default True
This post in 2017 solved my problem. It’s 2023 now but I still facing this problem caused by pin-memory, and it takes me a while to figure it out. I mean, the documentation really needs to specify the risk of using pin-memory instead of recommending people use it.
This is an advanced tip. If you overuse pinned memory, it can cause serious problems when running low on RAM, and you should be aware that pinning is often an expensive operation.
and also pin_memory is not set to True in e.g. the DataLoader. In case you have any suggestions where more warnings are needed, could you share them please?
Thanks for your reply and the warning info in the docs (which I don’t take it seriously at first).
But I still wonder in what situation pin_memory = True will become a problem, could you give some examples? It will be great if you would like to introduce some articles to give us some intuition about it.
Pinning memory disallows the system to use this memory for its allocations. A very simple example of an undesired behavior would be to pin too much memory, which would then force your OS into memory thrashing and moving pages into the swap thus tanking the performance of your entire system.
I’ve spent considerable amount of time for debugging DDP training on 4 GPUs. At some random point, training process slowed down by 3 times. Everything was getting very slow. And this stayed even after training finish. Only server restart was able to fix the slow-down. It was quite hard to find the real cause of the problem, we even thought on hardware problems, but in the end the whole thing is in one flag pin_memory=True
I would really recommend setting pin_memory to False in all cases. And then set it to True to check, if it adds any speedup.