TensorDataset with lazy loading?

Yes, but you can construct this huge file or split it in several big files for convenience. Alternatively, you also have hfd5 format that allows lazy loading and carrying metadata.

Anyway, it seem your use case would be better solved with a custom dataset that loads each file on-the-fly.Have a look at:
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html

So the short answer is:
If huge dataset / array, use hdf5 or memory map.
If hundreds of small files, use a custom dataset.

1 Like