Dataset from Kaggle not having the expected structure

Hi, I am new to pytorch and I am trying to use a dataset on Kaggle to train a neural network.

The dataset is a classic one (dogs vs cats) and can be found here:

However, when I download it and I do

data_dir = 'cat_dog_data/train'

transform = transforms.Compose([transforms.Resize(255),
dataset = datasets.ImageFolder(data_dir, transform=transform)
dataloader =, batch_size=32, shuffle=True)

I get the following error:

RuntimeError: Found 0 files in subfolders of: PATHTOTHEFOLDER/intro-to-pytorch/cat_dog_data/train
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp

It seems that if I remove the /train from the data_dir I don’t have the error. However, I think that this would be wrong since the real structure I should have is:


Instead, I have something like this


Should I manually create the folders dogs cats and remove the dog. and cat. from the .jpg filename? This seems weird but I don’t understand how to proceed otherwise.

Thanks for your help.

It was as I thought, the dataset that I download was not correctly structured.

I found out with this issue that also gives a link to a correctly structured dataset.

I hope this will help someone else in my situation.