For datasets that come integrated with PyTorch, this is very easy.
In the case of MNIST, doing this is enough-
from torchvision import datasets
train_data = datatsets.MNIST('.', train=True, download=True)
x_train, y_train = train_data.data, train_data.targets
But how do I create such series from custom datasets that I create using classes subclassing from torch.utils.data.Dataset
?
I have a dataset having a series of tuples of tensors and labels. The tensors are shaped 180*180 and the labels are integers.
>>> dataset[0]
(tensor([[1.5628, 1.5679, 1.5588, ..., 1.6395, 1.6355, 1.6354],
[1.5106, 1.5402, 1.5627, ..., 1.5813, 1.6235, 1.6520],
[1.5924, 1.6069, 1.5967, ..., 1.5813, 1.5924, 1.5964],
...,
[1.5945, 1.6138, 1.6241, ..., 1.6181, 1.6243, 1.6018],
[1.6006, 1.6283, 1.6591, ..., 1.6047, 1.6047, 1.6161],
[1.6181, 1.6181, 1.6129, ..., 1.5833, 1.5679, 1.6110]]),
5)
How do I go, from here, to create say, x of torch.Size([5000, 180, 180])
, and y of torch.size([5000])
?
I then want to create a TensorDataset
from here, from where to finally form a DataLoader.
This is a newbie question, and I cannot find the answer online.