Iterate Folder that contains Images

idriss · May 19, 2022, 9:38am

Hi guys , Im new to pytorch
I want to load my data from a folder that contains 9 images, but I can’t view my 9 images, I only managed to view 1 single image which changes each time when I compile my program

class Data_set_Papy(Dataset):
    def __init__(self , csv_file  ,root_directory_image , transform=None , target_transform=None , train= True):
  
        self.annotations=pandas.read_csv(csv_file)
        self.root_directory_image=root_directory_image
        self.transform=transform 
        self.target_transform=target_transform
        self.train=train


    def __len__(self):
        return len(self.annotations) # return the lenght of our csv file 



    
    def __getitem__(self, index) : # returnin a sprecific image and correspanding target to that image 
        our_image_path= os.path.join( self.root_directory_image, self.annotations.iloc[index , 0 ]) # colom zero the first column with a random index
        image= imread(our_image_path)
        y_label= torch.tensor(int(self.annotations.iloc[index , 1 ]))

        if self.transform:
            image=self.transform(image)
        if self.target_transform:
            image=self.target_transform(image)

        return image , y_label




training_data = Data_set_Papy(
    
    csv_file='index_papyrus.csv',
    root_directory_image='input_frags_papyrus',
    transform=  None,
    target_transform=None,
    train=True

)
testing_data = Data_set_Papy(
    csv_file='index_papyrus.csv',
    root_directory_image='input_frags_papyrus',
    transform= None,
    target_transform=None,
    train=False

)

train_dataloader=DataLoader(training_data , batch_size=1 ,  shuffle=True)
test_dataloader=DataLoader(testing_data, batch_size= 1, shuffle= False)

for data in train_dataloader:
    print(data)
    break
plt.imshow(data[0][0])
plt.show()

Andrei_Cristea · May 19, 2022, 9:43am

Hello, the code you sent will only return 1 image since you specified batch_size=1 in the definition of train_dataloader, which means every time you iterate over the dataloader it will return a single image, and you have a break in your loop, which means you stop after that first single iteration (and therefore image). So if you’d like to see more than one image, remove the break from the loop.

idriss · May 19, 2022, 9:48am

thank you for your answer, I tried to remove the break, I always have an image which is displayed, I tried to change the batch size and I have an error:RuntimeError: stack expects each tensor to be equal size, but got [850, 480, 3] at entry 0 and [503, 625, 3] at entry 1

that why I put a batch size which is equal 1

Andrei_Cristea · May 19, 2022, 11:07am

In terms of the display, in the code you pasted you only run one display command (which is plt.imshow(data[0][0])) and you run it after the loop has completed, so it’s expected that it will only display one image (specifically it’s displaying the image corresponding to the last datapoint it iterated through).

Regarding the RuntimeError, that’s likely because your images are of different shape, and indeed to include them in the same batch you must resize them to match. You can do that by passing a Resize transform to your training_data constructor, and you have to experiment with what common size works best.

from torchvision import transforms
...
training_data = Data_set_Papy(
    csv_file='index_papyrus.csv',
    root_directory_image='input_frags_papyrus',
    transform=  transforms.Resize((224, 224)),  # this
    target_transform=None,
    train=True
)

idriss · May 19, 2022, 11:53am

So I changed the transformation block as you told me , changed the way of loading the data, I also changed the batch size = 32,
I still have the problem of viewing just a single image images=image[2][0] if I want the image with position 3, and i can just just change the index to have the other position .
I try to find a solution to load all my images with a loop but I don’t know how, with regard to the error it is fixed, I leave you with the modifications I made in the code, and a screen of what I have in the compiler.

class Data_set_Papy(Dataset):
    def __init__(self , csv_file  ,root_directory_image , transform=None , target_transform=None , train= True):
  
        self.annotations=pandas.read_csv(csv_file)
        self.root_directory_image=root_directory_image
        self.transform=transform 
        self.target_transform=target_transform
        self.train=train


    def __len__(self):
        return len(self.annotations) # return the lenght of our csv file 



    
    def __getitem__(self, index) : # returnin a sprecific image and correspanding target to that image 
        our_image_path= os.path.join( self.root_directory_image, self.annotations.iloc[index , 0 ]) # colom zero the first column with a random index
        image= imread(our_image_path)
        y_label= torch.tensor(int(self.annotations.iloc[index , 1 ]))

        if self.transform:
            image=self.transform(image)
        if self.target_transform:
            image=self.target_transform(image)

        return image , y_label



transform = transforms.Compose(
 [
    transforms.ToPILImage(),
    transforms.Resize(255),
     transforms.CenterCrop(224),
     transforms.ToTensor()])
training_data = Data_set_Papy(
    
    csv_file='index_papyrus.csv',
    root_directory_image='input_frags_papyrus',
    transform= transform,
    target_transform=None,
    train=True

)
testing_data = Data_set_Papy(
    csv_file='index_papyrus.csv',
    root_directory_image='input_frags_papyrus',
    transform= transform,
    target_transform=None,
    train=False

)

train_dataloader=DataLoader(training_data , batch_size=32 ,  shuffle=True)
test_dataloader=DataLoader(testing_data, batch_size= 1, shuffle= False)

images, labels = next(iter(train_dataloader))
  
# print the total no of samples
print('Number of samples: ', len(images))
image = images[2][0]  # load 3rd sample 
plt.imshow(image, cmap='gray')
plt.show()

# print the size o  f image
print("Image Size: ", image.size())

samp

that the modification
i have 8 sample with a size of (224,224)

Andrei_Cristea · May 19, 2022, 12:14pm

Glad the error is gone!

Regarding wanting to see the 3rd image, right now your train_dataloader has shuffle=True which means it’s randomizing the order of the data, so you won’t be able to index the result from the dataloader and get the image you want.

Try this instead:

image = training_data[2][0]  # load directly from the dataset, not from the dataloader
plt.imshow(image, cmap='gray')
plt.show()

idriss · May 19, 2022, 12:25pm

i mean if i want the third image , with the previous code its works , but i want all the images in one time ! i want to see all the 9 images

Andrei_Cristea · May 19, 2022, 12:44pm

OK, try this:

from matplotlib import pyplot as plt

fig, axes = plt.subplots(nrows=3, ncols=3)  # set up a 3x3 grid to visualize
image_list = [x[0] for x in training_data]
for img, ax in zip(image_list, axes.ravel()):
    ax.imshow(img)
plt.gcf().set_size_inches(10, 10)

idriss · May 19, 2022, 1:10pm

seems working but have this at the end of my image position 8

idriss · May 19, 2022, 1:14pm

i figure out that my first image in my folder dosent load that what explain the empty

Andrei_Cristea · May 19, 2022, 1:22pm

Glad it worked, good luck with your project, seems very cool.

idriss · May 19, 2022, 1:26pm

thanks , you dont have an idea why my first image of my folder dosent load ?

idriss · May 19, 2022, 1:47pm

i fixed the problem , thank you so much for your time !

Andrei_Cristea · May 19, 2022, 1:48pm

In your output it says “Number of samples: 8” indicating you only have 8 images. So it seems like your dataset actually has 8 images, not 9 as you said previously. You can display them more nicely using plt.subplots(nrows=2, ncols=4) and then plt.gcf().set_size_inches(12, 6) now that you know they’re just 8 of them.

If you were expecting 9 images total, check your csv file and make sure all 9 are in there.

idriss · May 19, 2022, 1:48pm

that what i did thank you so much , i learned a lot of things
best !