Dataloader error for custom dataset

Hello

I have a custom dataset that looks like this:

age gender  genre(output)
--  ------  -------------
20  1       HipHop
26  1       Jazz
31  0       Classical
20  0       Dance

Priorly, I used this method for splitting:

X = df.drop(columns=["genre"])
y = df["genre"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Now, I would like to split the data to train_set and test_set, using random_split, like this, so I changed it to this:

fileName="dataset.csv"
custom_dataset  = None

with open(fileName, 'r') as f:
    custom_dataset = f.readlines()

client_dataset = torch.utils.data.Subset(custom_dataset, range(0, len(custom_dataset), 2))

val_samples_num = int(len(client_dataset) * valset_ratio)
test_samples_num = int(len(client_dataset) * testset_ratio)
train_samples_num = len(client_dataset) - val_samples_num - test_samples_num

trainset, valset, testset = random_split( client_dataset, [train_samples_num, val_samples_num, test_samples_num] )

But I keep getting this error:

for x, y in dataloader:
ValueError: too many values to unpack (expected 2)

I believe the approach to get the custom dataset in the middle of the routine is not right, and I would appreciate for any advice.

Thanks

I executed your code blocks. I only included the following imports. There is no error similar to the ones you mentioned. It works but “client_dataset” includes half of the data in csv file. For example, when you put 6 data in dataset.csv, trainset, valset, testset will include 1 sample. So, you process 3 data instead of 6 for this case.

from torch.utils.data.dataset import random_split
import torch

1 Like