Hello
I have a custom dataset that looks like this:
age gender genre(output)
-- ------ -------------
20 1 HipHop
26 1 Jazz
31 0 Classical
20 0 Dance
Priorly, I used this method for splitting:
X = df.drop(columns=["genre"])
y = df["genre"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Now, I would like to split the data to train_set and test_set, using random_split, like this, so I changed it to this:
fileName="dataset.csv"
custom_dataset = None
with open(fileName, 'r') as f:
custom_dataset = f.readlines()
client_dataset = torch.utils.data.Subset(custom_dataset, range(0, len(custom_dataset), 2))
val_samples_num = int(len(client_dataset) * valset_ratio)
test_samples_num = int(len(client_dataset) * testset_ratio)
train_samples_num = len(client_dataset) - val_samples_num - test_samples_num
trainset, valset, testset = random_split( client_dataset, [train_samples_num, val_samples_num, test_samples_num] )
But I keep getting this error:
for x, y in dataloader:
ValueError: too many values to unpack (expected 2)
I believe the approach to get the custom dataset in the middle of the routine is not right, and I would appreciate for any advice.
Thanks