Multiple errors while trying to load data

Pytorch_developer · April 5, 2022, 3:22pm

This is my code:
import torch
import torchvision
from torchvision import transforms, datasets
import torch.nn as nn
import torch.nn.functional as f
import torch.optim as optim
import os
import cv2
import numpy as np
from tqdm import tqdm

rebuild_data = True

class cats_and_dogs():
image_size = 50
cats = “PetImages/Cat”
dogs = “PetImages/Dog”
labels = {cats: 0, dogs: 1}

training_data = []
catcount = 0
dogcount = 0

def make_training_data(self):
    for label in self.labels:
        print(label)
        for f in tqdm(os.listdir(label)):
            try:

                path = os.path.join(label, f)
                img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
                img = cv2.resize(img, (self.IMG_SIZE, self.IMG_SIZE))
                self.training_data.append([np.array(img), np.eye(2)[self.labels[label]]])

                if label == self.cats:
                    self.catcount += 1
                elif label == self.dogs:
                    self.dogcount += 1

            except Exception as e:
                pass
    
    np.random.shuffle(self.training_data)
    np.save("training_data.npy", self.training_data)

    print("cats: ", self.catcount)
    print("dogs: ", self.dogcount)

if rebuild_data == True:
Dogs_and_Cats = cats_and_dogs()
Dogs_and_Cats.make_training_data()
and these are the errors:
PetImages/Cat
Traceback (most recent call last):
File “/Users/edenbrown/sample.ws45/new.py”, line 51, in
Dogs_and_Cats.make_training_data()
File “/Users/edenbrown/sample.ws45/new.py”, line 27, in make_training_data
for f in tqdm(os.listdir(label)):
FileNotFoundError: [Errno 2] No such file or directory: ‘PetImages/Cat’

Matias_Vasquez · April 5, 2022, 3:30pm

Hi,

you might want to look at this tutorial on how to create your own Dataset class.

Also, if your images are already divided by class, I would suggest using ImageFolder to create your Dataset more easily.

https://pytorch.org/vision/main/generated/torchvision.datasets.ImageFolder.html

However, your error seems to be that it does find that location.

Could you try putting an “r” before your strings for the cats and dogs variables?
Like this:

cats = r“PetImages/Cat”

Please let me know if this helps or if you still have this problem.

Pytorch_developer · April 5, 2022, 3:35pm

The r before the string did not help and I am still confused even with the tutorial.

Matias_Vasquez · April 5, 2022, 3:51pm

Ok,

let’s try it with ImageFolder. (I think this might help you, since it takes care of most of what you need to do.)

First of all, you need your Data divided into folders, where each folder contains only one class.

Judging by your code, you already have this.

So in order to create your Dataset, you only need to pass the path to where the root directory of your classes is.

from torchvision.datasets import ImageFolder
import torchvision.transforms as T

path = "[YOUR_PATH_TO_THE_ROOT]/root"
transform = T.ToTensor()

dataset = ImageFolder(root=path, transform=transform)

So, you only need to define your path until the root, where the folders for all of your classes are.

After this, you can access your dataset like this

index = 0
img, lbl = dataset[index]

This is a format that can be handled by the DataLoader… but this comes later. For now try to do this, and if you get stuck please let me know.

Pytorch_developer · April 5, 2022, 4:00pm

So should my code look like this:

from torchvision.datasets import ImageFolder
import torchvision.transforms as T

path = "[PetImages/Cat, PetImages/Dog]/root
transform = T.ToTensor()

dataset = ImageFolder(root=path, transform=transform)

index = 0
img, lbl = dataset[index]

and also what does lbl mean

Matias_Vasquez · April 5, 2022, 4:18pm

I have now put a random path of how it might look like. With “root”, I meant like in the image that I put. In your case, the “root” folder would be the “PetImages” one. But you still have to provide the full path so that python can find those files

from torchvision.datasets import ImageFolder
import torchvision.transforms as T

path = "C:/Users/Matias/Desktop/PetImages"
# Do not include dogs or cats in the path, just up to PetImages
transform = T.ToTensor()

dataset = ImageFolder(root=path, transform=transform)

index = 0
img, lbl = dataset[index]

Your code might look something like this.

lbl is for the label of the image it belongs to.
This means that the class cat might have the label 0 and dogs have the label 1. (or the other way around).

Let me know if it now works.

Pytorch_developer · April 5, 2022, 4:46pm

This has worked but I would like to know how to shuffle the dataset so the image and label are not always the same.

Matias_Vasquez · April 5, 2022, 5:05pm

Ok,

If you want your images to be in a random order, you can do something like this

import random

dataset.samples = sorted(dataset.samples, key=lambda k:random.random())

now you can create a DataLoader. This will give you batches of random images from your dataset.

You can do something like this:

from torch.utils.data import DataLoader

dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

You can look in the documentation that I have linked to get a better understanding.

This will be useful when you want to iterate the full dataset to train a model.

# Dumb example of how you might use the dataloader
for i, (imgs, lbls) in enumerate(dataloader):
    print(i, lbls) 
    # this is not useful, you will do something better than just print 
    #   the batch number, and the labels for this batch

If you see the first link that I gave you, further down in the “Iterating through the dataset” section, they show how they use this.

Hope this helps

Pytorch_developer · April 5, 2022, 5:40pm

I do not know if I did something wrong but I got this error:

Traceback (most recent call last):
File “/Users/edenbrown/sample.ws45/new.py”, line 15, in
for i, (imgs, lbls) in enumerate(dataloader):
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 530, in next
data = self._next_data()
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/dataloader.py”, line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py”, line 52, in fetch
return self.collate_fn(data)
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py”, line 172, in default_collate
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py”, line 172, in
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File “/Users/edenbrown/opt/anaconda3/envs/env_pytorch3/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py”, line 138, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 375, 500] at entry 0 and [3, 150, 200] at entry 2

My code is:

from torchvision.datasets import ImageFolder
import torchvision.transforms as T
import random
from torch.utils.data import DataLoader

path = “/Users/edenbrown/Downloads/kagglecatsanddogs_3367a”
transform = T.ToTensor()

dataset = ImageFolder(root=path, transform=transform)

dataset.samples = sorted(dataset.samples, key=lambda k:random.random())

dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for i, (imgs, lbls) in enumerate(dataloader):
print(i, lbls)

Pytorch_developer · April 5, 2022, 5:42pm

I think the error may be something to do with image size because some of the images are different sizes.

Matias_Vasquez · April 5, 2022, 5:46pm

If you have images of different sizes, you can try resizing them within the transform when creating the dataset.

transform = T.Compose([
    T.Resize((150, 200)),
    T.ToTensor()
])

Pytorch_developer · April 5, 2022, 5:54pm

This has worked so thank you.

Pytorch_developer · April 5, 2022, 6:21pm

Also I do not believe the:

import random

dataset.samples = sorted(dataset.samples, key=lambda k:random.random())

is needed as the code still works perfectly without it.

Also the labels are the same for cats and dogs but I will figure out how to fix that in a different post.