Reading GIF and octet-stream images via ImageFolder failed

I scraped a few images. However, a lot of them are GIF and octet-stream. It seems the ‘ImageFolder’ directly ignores those kind of images. For example, the image_datasets below only has 2 data points as there are only 2 jpg images in this folder and the others are GIF or octet-stream.

image_datasets = datasets.ImageFolder(os.path.join(data_dir, validation)
 Number of datapoints: 2

Is there any more general way to read the image files.

The most flexible way to load your data would be to write your own Dataset.
Basically you just have to worry about three functions:

  • __init__: here you have to pass your data or paths to your data, if you want to lazily load it. Also you can define transformations here, which should be applied on your data (and target).
  • __getitem__: here you have to implement your logic to load your data. E.g. in case you’ve passed paths to your Dataset, you can get the current path using index and use whatever library you want to load a single sample.
  • __len__: returns the length of your Dataset.

Here is a small example:

from PIL import Image
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, gif_paths):
        self.paths = gif_paths

    def __getitem__(self, index):
        x = Image.open(self.paths[index])
        # Convert the gif here to frames, etc.
        ...
        return x

    def __len__(self):
        return len(self.paths)

gif_paths = ['./a.gif', './b.gif']
dataset = MyDataset(gif_paths)
loader = DataLoader(
    dataset,
    batch_size=10,
    num_workers=2,
    shuffle=True
)
1 Like

Hi ptrblck

Thanks for your reply. I would like to utilize the existing functionality of the original ImageFolder dataset as much as I can so that I can easily read labels and apply torchvision.transforms later on.

I look at the source code of ImageFolder. The main problem is that there is a parameter called 'IMG_EXTENSIONS which lists the extensions wth which the file will be loaded by the default image loader. Any file with an extension not listed in this list will be ignored during loading.

IMG_EXTENSIONS = [
    '.jpg', '.JPG', '.jpeg', '.JPEG',
    '.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP',
]

Is there any way I can just reset this parameter and reserve the other functionality of the ‘ImageFolder’ class?

You could use DatasetFolder and provide the extensions yourself.
Also, you might want to provide the loader to load and process the .GIF files, and return a sample.

1 Like

Hi ptrblck

Thanks for your suggestion. I implemented a customized DatasetFolder. However, I run into an issue reported here Iterate through customized DatasetFolder does not work

Do you have suggestion?

Best