Image Folder with no subfolders

Ayshine · October 24, 2018, 10:32am

Hi,
I am trying to load images from a folder with no subfolders and having the error below. When I try to read an image from the folder I can load .jpg file with no error. My file format meets the pytorch image format criteria. Also, transforms.Compose just works fine (this must mean nothing wrong with my pytorch installation) Is there any other way to load image data or anything I can do to solve this error.
Thanks in advance!
Ayshine

ptrblck · October 24, 2018, 11:07am

torchvision.datasets.ImageFolder expects subfolders representing the classes containing images of the corresponding class.
If you just would like to load a single image, you could load it with e.g. PIL.Image.open and pass it to your transform.
However, if you don’t want to change your code, just move your image to a subfolder and ImageFolder should work.

Ayshine · October 31, 2018, 11:35am

Thanks @ptrblck,

Unfortunately I need to load 25000 image dataset. I had a couple of issues about labels, too. May be I can write a script to arrange them in folders.

ptrblck · October 31, 2018, 10:56pm

How are the images labeled at the moment? Are the labels somehow written into the file names or do you have a separate list with the image paths and the corresponding label?

Ayshine · October 31, 2018, 11:31pm

There are couple of label files that are all in .mat file. I couldn’t find the way to open them so started converting two relevant files to .csv files. One of those files labels images if there is one person or not and the other one classifies them in terms of main activity and the sub activity.

ptrblck · October 31, 2018, 11:43pm

You can load .mat files with scipy.io.loadmat.
If you could create one .csv file containing the image path and both labels, we could write a Dataset and load all corresponding images and labels using this file without moving your data.

Ayshine · November 1, 2018, 2:31pm

That would be great! Let me try to merge them all in one .csv then come back here.

ptrblck · November 1, 2018, 3:22pm

Sure! If you’ve created the .csv, could you post a few lines of the file so that we can write a dummy Dataset using your format?
Also, let me know, if you need some help creating the .csv from your MATLAB files.

Ayshine · November 2, 2018, 8:04am

Thanks for the offer helping with MATLAB files. Now, my problem became more about reading MATLAB files. I am working on this dataset: MPII Human Pose Database
If you are also familiar with the paper I would really appreciate your recommendations about how to feed this dataset to a deep learning model built using pytorch.I could fınd the fırst ımage file name as follows.I’ll try to find a way to retrieve the rest.

ptrblck · November 3, 2018, 4:10pm

I had a look at the labels of the dataset, and it seems loadmat wraps the data in a ton of unnecessary nested arrays. I’ll try to load it with Octave and see how far I come.

Ayshine · November 5, 2018, 8:20am

Oh, Thanks! I managed to gather all the important ınfo to one CSV file. Here is a couple of lines.

image_path,train_test,category,activity,single_person

029122914.jpg,1,‘occupation’, ‘truck driving, loading and unloading truck, tying down load, standing, walking and carrying heavy lo’,1
061185289.jpg,1,‘occupation’, ‘truck driving, loading and unloading truck, tying down load, standing, walking and carrying heavy lo’,1
013949386.jpg,1,‘occupation’, ‘truck driving, loading and unloading truck, tying down load, standing, walking and carrying heavy lo’,1
029214465.jpg,1,‘occupation’, ‘truck driving, loading and unloading truck, tying down load, standing, walking and carrying heavy lo’,1

ptrblck · November 6, 2018, 12:50pm

Awesome!
Did you manage to create the Dataset?
If not, could you explain the format the the labels a bit?
The first column obviously contains the image names. Which column refers to the labels and what kind of ino do you need for your training?

Ayshine · November 6, 2018, 1:49pm

Sorry, I should’ve been more clear

image_path: the name of the images
train_test: if 1 the image is training data 0 the image is
category: the main category of the action
activity: detailed activity of the person or people in the image
single_person: contains rectangle id ridx of sufficiently separated individuals (this explanation is from the database page I haven’t figured out how to use this information)

It looks like test data does not have any category, activity or single_person information. After filtering train data I no longer need this column. For my initial model, I only take category column as the label but single_person information is also crucial for my use case. I might also drop the activity column In Sum, for my ultimate goal is a binary decision between the states below:

There is one person on the image doing a regular activity
There are multiple people doing something else or The figure in the image does not belong to one person doing a regular activity.

I might be heading nowhere by using this dataset, but I’d like to share my use case to explain more clearly why I chose the dataset in the first place.

ptrblck · November 6, 2018, 2:25pm

OK, I see.
Could you save the .csv with ; as a separator, as this will make the reading a bit easier, since activity uses , inside the text?

Let’s make sure I understand your use case.
The “regular activity” is defined by category? If so, I assume you would like to group the activities into regular and not regular. Which categories would belong to which class?
Could you post an example for each case?

If single_person is set to 0, we know, that there are more than one person in the image.
How do you know the person in the image is not doing the regular activity?

Ayshine · November 7, 2018, 1:10pm

Now, I saved the file with ; separator. The three most important information I have image_path, category and single_person. After a little further reading single person has a list having ids of the coordinates of the head rectangle. If 0 then no one in the image, else if 1 there is one person, else more there are more people. I can use this column as my label and look for if there is one person in the image or not then use category as further info. In this case, I have 3 distinct labels.

image_path,train_test,category,single_person
030424224.jpg;1;miscellaneous;[1,2]
052475643.jpg;1;miscellaneous;[1,2]

Ayshine · November 11, 2018, 12:34am

I decided to come up with a model have classes 0, 1 or 2 (2 for 2 or more people) and moved my images to data folder using this code.It took seconds

i = 0
import shutil
from pathlib import Path

for i in range(len(filenames)):
    if train_test[i] != '0':
        my_file = Path('images/'+ filenames[i])
        if my_file.is_file():
            if len(sps[i]) == 0:
                shutil.copyfile('images/'+ filenames[i], 'data/0/'+filenames[i]) 
            elif len(sps[i]) == 1:
                shutil.copyfile('images/'+ filenames[i], 'data/1/'+filenames[i]) 
            else :
                shutil.copyfile('images/'+ filenames[i], 'data/2/'+filenames[i])

removed couple of things so my data in .csv file looks like this

015601864.jpg;1;sports;curling;12
015599452.jpg;1;sports;curling;3

Writing this if it would help someone else too and marking this reply as solution.

Steven_Mugisha · February 7, 2020, 9:07am

Considering that testing images are in most cases not categorized into classes as train images are, does pytorch have a way of loading such images or do I have to write a custom loop to load all of them.

ptrblck · February 8, 2020, 8:12am

A custom Dataset would probably be “cleaner”, but you could also load all images from a single folder and just discard the target.