Hi, I am new to pytorch and I am trying to use a dataset on Kaggle to train a neural network.
The dataset is a classic one (dogs vs cats) and can be found here: https://www.kaggle.com/c/dogs-vs-cats
However, when I download it and I do
data_dir = 'cat_dog_data/train'
transform = transforms.Compose([transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor()])
dataset = datasets.ImageFolder(data_dir, transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True)
I get the following error:
RuntimeError: Found 0 files in subfolders of: PATHTOTHEFOLDER/intro-to-pytorch/cat_dog_data/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp
It seems that if I remove the /train
from the data_dir
I don’t have the error. However, I think that this would be wrong since the real structure I should have is:
cat_dog_data/train/dogs
cat_dog_data/train/cats
Instead, I have something like this
cat_dog_data/train/cat.13.jpg
cat_dog_data/train/cat.11.jpg
...
cat_dog_data/train/dog.1.jpg
cat_dog_data/train/dog.21.jpg
...
Should I manually create the folders dogs
cats
and remove the dog.
and cat.
from the .jpg
filename? This seems weird but I don’t understand how to proceed otherwise.
Thanks for your help.