Why use torch dataset?

Thomas_Johnson · May 2, 2019, 1:36am

I understand that data.Dataset can be useful if we need to prepare our data (e.g., load it from disk, or do data augmentation). Is there any benefit to using Dataset if all of our data is already in memory and we are not doing augmentation?

Tony-Y · May 2, 2019, 3:04am

We can use TensorDataset for on-memory data that some tensors store. That dataset is fed to DataLoader, which produces shuffled batches.

Thomas_Johnson · May 2, 2019, 3:38am

So it’s just so I don’t have to do my own shuffling?

Tony-Y · May 2, 2019, 3:48am

Your own shuffling is not banned in pytorch.

justusschock · May 2, 2019, 7:22am

Essentially it is, that you don’t have to bother yourself with shuffling, efficient batching (for various types of inputs) and pinning memory.

as @Tony-Y stated, you can always introduce your own methods, but you don’t have to.