You can use the plain tensors as X_train
and y_train
, if you are able to load them completely (and push to the GPU without sacrificing too much memory).
The Dataset
is ab abstraction to be able to load and process each sample of your dataset lazily, while the DataLoader
takes care of shuffling/sampling/weigthed sampling, batching, using multiprocessing to load the data, use pinned memory etc.
This tutorial might be helpful to see the advantages of using this approach.
That being said, you are of course fine to use the tensors directly, which might also be faster if you are using a tiny dataset.
13 Likes