I am working with a dataset of around 2TB at the moment and have an input to my network which is Batch Size x 20 x 240 x 240 int8. At the moment, I transfer this dense tensor over to the gpu which takes around 10% of my batch compute time. I was wondering, if it is at all possible using Pytorch to create this as a Sparse tensor on the CPU, transfer it over to the GPU, then recreate it as a dense tensor for running through my network. For context, I am running this using DDP at the moment on a 4 2080 machine.

One thought I had was to do the above and perform an identity matrix multiply on the sparse tensor to get it into a dense tensor for further operations. Although, I was wondering if there is a better method.