How to use DataLoader pin memory when data is a mix of cpu and gpu

Hello,

I am writing a multi-worker data loading pipeline. My input is given by high resolution images that I am decoding directly into the GPU, while my target is given by common labels that I load in CPU memory and move them to GPU only when accessed by the main (trainer) process.

My problem is that I cannot use pin_memory with my DataLoader, since it tries to also pin the GPU data instead of skipping it, and throws this error:
RuntimeError: cannot pin ‘torch.cuda.ByteTensor’ only dense CPU tensors can be pinned

Does anyone know any workaround for this? I would like to exploit pin memory to speedup the CPU-GPU data transfer, but still have not found any trivial solution to this.

You could explicitly create the target tensor using pinned memory or move it as described in this tutorial.