How to load your own dataset?

How should I inherit the ‘Dataset’ class to load data?
I don’t understand why to use ‘len’,and ‘getitem’?

The __len__ method should return the length of your dataset.
In the standard use case you would just return the number of all samples you would like to load.
For more advanced use cases, you could manipulate the length of the dataset artificially to e.g. support some custom sampling strategy.

Your Dataset's __getitem__ will be called with indices in the rage [0, len(dataset)].
Using this index you are able to implement code to load each sample (e.g. each image file) and process it there.
This workflow enables you to lazily load the data, so that you can work with large datasets, which wouldn’t fit into your system RAM.

Also, have a look at the data loading tutorial for more information.

1 Like