Regarding __getitem__
, you can customize it to return whatever it is you want to use in your training loop. For example, in your case you may try something like this:
class CustomDataset(Dataset):
def __init__(self, csv_file):
self.data = pd.read_csv(csv_file, header=None)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
row = self.data.iloc[idx]
sample = {'text': row[0], 'number': row[1]}
return sample
dataset = CustomDataset("/content/Q_V_1.08.csv")
for foo in dataset:
print(foo["text"], foo["number"])
Output:
alpha 100
bravo 200
charlie 300
delta 400
Regarding split and test, you might do this which I found by Googling “pytorch train test split”.