Creating A Dataset from keras train_test_split

d3tk · October 27, 2022, 9:44pm

I have a dataset of images and then a continuous value. I’m using a CNN model to predict that value. There are 14,000 images and 14,000 values. I know in Keras I can use train_test_split to get X_train, y_train, X_test, and y_test then would use model.fit()

but to train my model in pytorch, do I combine X_train and y_train into a dataset and use a DataLoader or is there another way? ie not using keras train_test_split

My code so far:
The images are 360x360 and grayscaled so the input shape is (1,360,360)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)

X_train.shape is  torch.Size([9814, 360, 360, 1])
y_train.shape is torch.Size([9814, 1])
X_test.shape is torch.Size([4834, 360, 360, 1])
y_test.shape is torch.Size([4834, 1])

training_set = torch.hstack((X_train,y_train))
validation_set = torch.hstack((X_test, y_test))

training_loader = torch.utils.data.DataLoader(training_set, batch_size=64, shuffle=True, num_workers=2)

validation_loader = torch.utils.data.DataLoader(validation_set, batch_size=64, shuffle=False, num_workers=2)

Thank you in advance!

nivek · November 1, 2022, 3:52pm

Typically, users create a Dataset object then use random_split to split into train/test sets.

Here’s an additional tutorial that you may find helpful.