I am facing this issue even with the updated PyTorch nightly version. The Dataloder memory usage continuously increases until it runs of memory. My Dataset size is 26GB when initialized, it contains an ndarray from which I return an element based on index value. After running on 10% of data it ends up using another 30+GB of ram and 40GB + swap space. I tried upgrading the PyTorch with the latest nightly version. Tried both on python3.6 and Python3.7, but the issue persists.
I just load the whole dataset in the init() of DataLoader, others are just as normal. Be sure to make num_worker < physical cpu . I don’t know why, but it works well when your memory is limited. E.g., my dataset is about 30 GB, I can run on a two-GPU machine with 80 GB memory. Num_worker and physical cpu are 10 and 12 respectively.