I met a problem in using the torch.utils.data.TensorDataset and Dataloader.
According to my knowledge, I need to put the input and output in data_x and data_y,
and write code like this:
dataset = TensorDataset(data_x,data_y)
train_loader = Dataloader(dataset = dataset,batch_size = 4, shuffle = True)
But the problem is that my data_x is too large for my memory(only 8GB).
I just corp human facies from about 25K images and resize them to 224*224 to be the input of my network,
put all of these facies in data_x is not allowed by my memory.
You don’t need to load all your images into memory, but can lazily load each sample of needed.
Therefore just pass the image paths to your
Dataset and load each image in
Have a look at the Data loading tutorial to see an example.
Oh, I understand. I need to save the faces which corp from the images into .jpg and then the pass the path of the faces to the Dataset.
If you have the face coordinates or could calculate them live, you could also crop them on the fly in your
__getitem__. This might be a bit slower compared to preprocess the data and save just the faces, but in the end it depends on your coding style I guess.