Hi !
I’m trying to apply a model to an industrial dataset. This dataset is in CSV, has one observation per line over roughly 100 columns of variables.
Here is the code I wrote after looking at the available tutorials :
from torch.utils.data.dataset import Dataset
from torch.utils.data.dataset import TensorDataset
import matplotlib
import pandas as pd
from torchvision import transforms
import numpy as np
# tutorial from https://github.com/utkuozbulak/pytorch-custom-dataset-examples
class datasetIndustrialSensor(TensorDataset):
# For this custom dataset I will have access to a variety of industrial sensors
# I will try to see how good is the model applied to those observations
def __init__(self, csv_path):
"""
Args:
csv_path (string): path to csv file
transform: pytorch transforms for transforms and tensor conversion
"""
# Read the csv file
self.data_info = pd.read_csv(csv_path)
def __getitem__(self, index):
# Note : skipping first useless column
obs = self.data_info.iloc[index, 2:].as_matrix()
obs = obs.astype('float')
sample = {'obs': obs}
return sample
def __len__(self):
return len(self.data_info)
In the actual model, I begin by initializing this custom dataset :
from utils.custom_dset import datasetIndustrialSensor as DatasetC
dset = DatasetC("path/to/file.csv")
And finally I call the DataLoader util in hope to get a train/test loader:
train_loader, test_loader = DataLoader(dset, batch_size=256, shuffle=True)
However, this throw an error :
ValueError: too many values to unpack (expected 2)
I tried a few variations on the getitem function, without success. Anyone has a clue on how to tackle this issue ?
Thanks !