How to create custom dataset for training NN

misiak · May 4, 2020, 4:56pm

Hey guys,
I´m new in developing with pytorch and i would appreciate if someone could help me with creating my own dataset. I checked many topics and forums but it seemed like no one was in the same situation. I got data with following directory hierarchy

I also got an CSV file formats, as you can see is called test-480_ImgPats. Inside this CSV is a path to images which looks like this - this screenshot will be in comment below.
I have also created a directory called “labels” for each image with corresponding class (0,1,2,3,4).
Another additional data for each image called text-480_data have 4 numbers corresponding to each image.

For example. In directory Test is another directory called 4_24, where is image with number 1.jpg. This image had created label in directory labels/Test/4_24/1.jpg with output 0-4 classes. Every image has metadata (CSV file called Test-480_data) that contains 4 numbers separated with comma. And every image has a CSV with paths.

How to create my own dataset with this annotations ?
I hope that someone will understand
Thank you everyone !

misiak · May 4, 2020, 4:56pm

This is screenshot 2.
I forgot to mention that each image have 480x360 px with RGB.

ptrblck · May 5, 2020, 6:52am

I’m not sure I understand the data structure completely, but would recommend to use e.g. pandas and to make sure you can load all images with their label.
Once you’ve created the mapping between the image paths, their metadata, and targets, I would create a csv (or any other format you like), such that indexing this file will give you everything you need to load the data sample and create or load the target.

If that’s done, you could load this new csv in Dataset.__init__, and lazily load each sample in its __getitem__.