Sorry that I am still a tiro in Pytorch, and so may raise a naive question: now I managed to collect a great deal of application data in a csv file, but got no idea on how to load the .csv file into a PyTorch “datasets”.
Alternatively, can I bypass the PyTorch datasets but instead use the PyTorch DataLoader() class to load those CSV data directly?
Feel free to checkout my notebook.
You’ll learn how to load any type of data stored in csv files using torch.utils.data.Dataset and torch.utils.data.DataLoader.
It also includes the new TorchData (DataPipes) functionality in case you are interested.
I have a question regarding to this, so I need to upload the rows from .csv files and I am using pandas.read_csv() and then .iloc() to get the exact part I want. And then I used np.array(.iloc(csv_file)). Then, after uploading all the files, I just convert the whole thing as np.array().
file_path = glob.glob(file_path)
data_frames = np.array([np.array(pd.read_csv(file).iloc[3:, 1:]) for file in file_path])
And then I group my data with labels with a class where I convert torch.tensor(sequence, dtype=torch.float32) and for labels converted as torch.tensor(label, type=torch.long). And then I call the DataLoader() to separate into batches. But if I only upload one row from each .csv file, then when I iterate through the train_loader, I am getting an error message saying "can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint64, uint32, uint16, uint8, and bool.
file_path = glob.glob(file_path)
data_frames = np.array([np.array(pd.read_csv(file).iloc[3, 1:]) for file in file_path])