Cache entire train/test data on GPU

I have noticed that in some situations, a smaller batch size decreases error, but I want to minimize IO overhead for an epoch with many iterations.

It seems that the Dataloader(...,pin_memory=True) option is designed to speed up batch loading. What if I know that most of my train/test data sets will fit on the device? Is there a way to cache the entire data set on the GPU, so that I can tune batch size as needed without suffering from increased IO overhead for small batches?


If your data is small enough, it can be completely stored on GPU itself for computation.

data ='cuda')

Else, you can use pin_memory=True to cache the data.

Please refer to this Q&A

Hope this helps.

1 Like

Thanks @surya00060!

I added an option to invoke this inside my DataSet class and it runs way faster now.

1 Like