Previously I was using PyTorch to split my dataset and train my classifier, but now I want to use Sci-Kit learn to train my SVM model. For that reason, I need to split my dataset into train and test set. Now, Sci-Kit learn uses this xtrain, xtest, ytrain, ytest = X, y, test_size=0.3, random_state=42)
to split. I am using this to split my data -
from google.colab import drive
drive.mount('/content/drive')
data = "/content/drive/My Drive/AMD_new"
train_data = datasets.ImageFolder(data+"/train", transform=transform_train)
test_data = datasets.ImageFolder(data+"/val", transform = transform_test)
#n_classes = test_data.shape[1]
n_classes = len(test_data.classes)
print(n_classes)
batch_size = 32
dataloader_train = torch.utils.data.DataLoader(train_data, batch_size, shuffle=True, num_workers=2)
dataloader_test = torch.utils.data.DataLoader(test_data, batch_size, num_workers=2)
These are 4 folders, labeled, along with images, that are uploaded into Google Drive and I am doing it from Google colab. Can anyone please tell me that how can I split the data into xtrain, xtest and ytrain and so on. Should I connect xtest with my valid folder? and xtrain with my train folder? Then what about ytrain and ytest? I am confused a little bit. Please help me to solve this. Thanks.