How to setup a custom Dataset to work with PyTorch

Gerrit · January 27, 2021, 11:36pm

So I am trying to create a Dataset class that should later work like any other the standard pytorch Dataset. The raw data contains measurements from accelerometers that were attached to a gearbox. Data was collected in 560 different runs while the health of the gearbox was degrading between the runs. I already extracted 16 features for each run which i would like to use as input data for the nerual network(Deep Forward). The point I am struggeling with, is how to generate a dataset of these runs. I followed this tutorial, but get stuck because my data is two dimensional, where the different runs are arranged in rows and the respective features in columns. This is the code I got so far but I dont know how to access the different rows.

import os
import pandas as pd
from torch.utils.data import Dataset

class Features(Dataset):
    def __init__(self):
        folderpath = r'...\Balanced_Data'
        data_path = os.path.join(folderpath, 'good_and_bad.csv')
        data = (pd.read_csv(data_path, header = None))
        self.samples = data
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        return self.samples.at[idx,0]

When trying to access the single rows,

dataset.samples[0]

yields all entries from the first column.

Since I am fairly new to this topic I very much appreciate any help and tipps for a general approach to this.

Cheers,
Gerrit

Dwight_Foster · January 28, 2021, 12:34am

If you use the pandas .iloc() function here you can pass in an index and get out the row. The documentation is found here.

Gerrit · January 28, 2021, 9:13am

That worked,
thanks a lot