Loading a csv with a column of strings and a column of integers

Andrei_Cristea · May 8, 2022, 2:32pm

Regarding __getitem__, you can customize it to return whatever it is you want to use in your training loop. For example, in your case you may try something like this:

class CustomDataset(Dataset):
    def __init__(self, csv_file):
        self.data = pd.read_csv(csv_file, header=None)
                                
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        sample = {'text': row[0], 'number': row[1]}
        return sample

dataset = CustomDataset("/content/Q_V_1.08.csv")
for foo in dataset:
    print(foo["text"], foo["number"])

Output:
alpha 100
bravo 200
charlie 300
delta 400

Regarding split and test, you might do this which I found by Googling “pytorch train test split”.