Hello,
I am writing a multi-worker data loading pipeline. My input is given by high resolution images that I am decoding directly into the GPU, while my target is given by common labels that I load in CPU memory and move them to GPU only when accessed by the main (trainer) process.
My problem is that I cannot use pin_memory with my DataLoader, since it tries to also pin the GPU data instead of skipping it, and throws this error:
RuntimeError: cannot pin ‘torch.cuda.ByteTensor’ only dense CPU tensors can be pinned
Does anyone know any workaround for this? I would like to exploit pin memory to speedup the CPU-GPU data transfer, but still have not found any trivial solution to this.