How to load csv file into a PyTorch datasets

Small_Red_69 · July 30, 2022, 4:06am

Sorry that I am still a tiro in Pytorch, and so may raise a naive question: now I managed to collect a great deal of application data in a csv file, but got no idea on how to load the .csv file into a PyTorch “datasets”.

Alternatively, can I bypass the PyTorch datasets but instead use the PyTorch DataLoader() class to load those CSV data directly?

Thanks a lot for any help!

ptrblck · July 30, 2022, 5:37am

You could write a custom Dataset and load the data there as described in this tutorial.

srishti-git1110 · July 30, 2022, 6:08am

Feel free to checkout my notebook.
You’ll learn how to load any type of data stored in csv files using torch.utils.data.Dataset and torch.utils.data.DataLoader.
It also includes the new TorchData (DataPipes) functionality in case you are interested.

L_Z · July 26, 2024, 3:00pm

I have a question regarding to this, so I need to upload the rows from .csv files and I am using pandas.read_csv() and then .iloc() to get the exact part I want. And then I used np.array(.iloc(csv_file)). Then, after uploading all the files, I just convert the whole thing as np.array().
file_path = glob.glob(file_path)
data_frames = np.array([np.array(pd.read_csv(file).iloc[3:, 1:]) for file in file_path])
And then I group my data with labels with a class where I convert torch.tensor(sequence, dtype=torch.float32) and for labels converted as torch.tensor(label, type=torch.long). And then I call the DataLoader() to separate into batches. But if I only upload one row from each .csv file, then when I iterate through the train_loader, I am getting an error message saying "can’t convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint64, uint32, uint16, uint8, and bool.
file_path = glob.glob(file_path)
data_frames = np.array([np.array(pd.read_csv(file).iloc[3, 1:]) for file in file_path])