i am new to pytorch i have one folder which contain train.csv file and train folder train.csv contain image name with corresponding labels and train contains images how to load the images and then train the model.
You could create a custom Dataset
as explained in this tutorial.
In the Dataset.__init__
method you could load the corresponding csv
file and load each sample in __getitem__
lazily. To do so, you could index the csv
file (e.g. via a pd.DataFrame
), load (and transform) the corresponding image and create the target tensor.
Let me know, if you get stuck somewhere.
sir my train.csv is in this form
do i need to convert target to one hot vector?
also what is idx in getitem(idx)?
If you are working on a multi-class classification, the targets should be the class indices and should not be one-hot encoded.
E.g. if your use case uses 5 targets, the valid values would be [0, 1, 2, 3, 4]
.
For your input data, you could use a mapping, such that e.g. manipuri
maps to 0
, odissi
maps to 1
, etc.
The Dataset.__getitem__(self, index)
method is called by the DataLoader
with an index for each sample in the range [0, len(dataset)]
and is responsible to load and return the sample for the current index.
sir after all the loading i trained the model the no of images in train dataset is 364 only. so i want to keep all images corresponding to different classes equally in my train dataset and validation dataset . how can i do that? i am currently using mobilenet_v2 as model since images are less should i write my own model or using this model what can i do to increase accuracy?
364 images are not that many and your model might overfit quickly, especially since you would need to split this dataset into a training, validation, and test set.
You could try to use an aggressive data augmentation and observe the validation loss to make sure the model still generalizes well.
I don’t think that a custom model trained from scratch would be easier, so your best bet might be to try to fine tune a pretrianed model, add data augmentation, maybe increase the regularization, or in the best case collect more data.
how to stratify my image dataset so that my train and validation dataset has equal weightage of classes(i have 8 classes every class must be there in train as well as validation with equal ratio).
Thanks for your help
You could use sklearn.model_selection.train_test_split
with the stratify
argument.
This would return indices for the training and validation dataset, which you could then pass to a Subset
or SubsetRandomSampler
.