Setting Up Image Dataset to feed into Dataloader (Beginner)

obeavers · December 4, 2017, 6:02pm

Hello,

I’ve used some of the pre-defined datasets in Pytorch, and now I’m trying to move into creating my own project using transfer learning.

What I’m slightly confused on, is how to setup the dataset. I’ll be training a model off of ResNet18 for image classification (two classifying features).

As I setup the scrape for the images, I’m trying to figure out how to name/organize the training set correctly.

Wondered if someone could confirm whether this would this be an appropriate way to setup the data:

CSV file with following information:

Image ID
Classification # 1
Classification # 2

And then name the corresponding JPEGs as their ImageIDs as defined in CSV file?

Then, using the Dataloader class, I’d ideally be able to link the two together as I think I’ve seen.

Is that an implementable workflow? Apologies for such a newbie question - there is a lot of data to get, and I’d like to plan it out right the first time. I’ve done a bit of reading, and think this should work, but just wanted to get confirmation.

Thanks!

SimonW · December 4, 2017, 8:47pm

I believe the easiest way is to implement a Dataset class with the csv file you described. The class only needs to provide a method to fetch data a particular given index. http://pytorch.org/docs/master/data.html#torch.utils.data.Dataset