How to Load images in pytorch

mudit · May 25, 2020, 5:08am

i am new to pytorch i have one folder which contain train.csv file and train folder train.csv contain image name with corresponding labels and train contains images how to load the images and then train the model.

ptrblck · May 25, 2020, 5:31am

You could create a custom Dataset as explained in this tutorial.
In the Dataset.__init__ method you could load the corresponding csv file and load each sample in __getitem__ lazily. To do so, you could index the csv file (e.g. via a pd.DataFrame), load (and transform) the corresponding image and create the target tensor.

Let me know, if you get stuck somewhere.

mudit · May 25, 2020, 5:13pm

sir my train.csv is in this form
Screenshot from 2020-05-25 22-41-18
do i need to convert target to one hot vector?
also what is idx in getitem(idx)?

ptrblck · May 26, 2020, 2:35am

If you are working on a multi-class classification, the targets should be the class indices and should not be one-hot encoded.
E.g. if your use case uses 5 targets, the valid values would be [0, 1, 2, 3, 4].

For your input data, you could use a mapping, such that e.g. manipuri maps to 0, odissi maps to 1, etc.

The Dataset.__getitem__(self, index) method is called by the DataLoader with an index for each sample in the range [0, len(dataset)] and is responsible to load and return the sample for the current index.

mudit · June 6, 2020, 5:57pm

sir after all the loading i trained the model the no of images in train dataset is 364 only. so i want to keep all images corresponding to different classes equally in my train dataset and validation dataset . how can i do that? i am currently using mobilenet_v2 as model since images are less should i write my own model or using this model what can i do to increase accuracy?

ptrblck · June 6, 2020, 10:11pm

364 images are not that many and your model might overfit quickly, especially since you would need to split this dataset into a training, validation, and test set.

You could try to use an aggressive data augmentation and observe the validation loss to make sure the model still generalizes well.

I don’t think that a custom model trained from scratch would be easier, so your best bet might be to try to fine tune a pretrianed model, add data augmentation, maybe increase the regularization, or in the best case collect more data.

mudit · June 9, 2020, 7:57am

how to stratify my image dataset so that my train and validation dataset has equal weightage of classes(i have 8 classes every class must be there in train as well as validation with equal ratio).
Thanks for your help

ptrblck · June 10, 2020, 6:54am

You could use sklearn.model_selection.train_test_split with the stratify argument.
This would return indices for the training and validation dataset, which you could then pass to a Subset or SubsetRandomSampler.