Assume I want to create a new dataset. My dataset consists of a N “feature tensors” and N “label tensors”.
How would I go about creating a dataset from these tensor? What is the proper way of storing it to disk, and then retrieving it?
I could save everything into a file using torch.save, and then subclass the Dataset class and implement __getitem__
by simply loading the whole dataset from disk, and return the correct element.
For N very big, though, this could be problematic; it will be very slow and occupy a lot of memory. So is there a better way, that avoids loading everything into memory, and potentially also play nice with having to retrieve batches of random elements?