I scraped a few images. However, a lot of them are GIF and octet-stream. It seems the ‘ImageFolder’ directly ignores those kind of images. For example, the
image_datasets below only has 2 data points as there are only 2 jpg images in this folder and the others are GIF or octet-stream.
image_datasets = datasets.ImageFolder(os.path.join(data_dir, validation)
Number of datapoints: 2
Is there any more general way to read the image files.
The most flexible way to load your data would be to write your own
Basically you just have to worry about three functions:
__init__: here you have to pass your data or paths to your data, if you want to lazily load it. Also you can define transformations here, which should be applied on your data (and target).
__getitem__: here you have to implement your logic to load your data. E.g. in case you’ve passed paths to your
Dataset, you can get the current path using
index and use whatever library you want to load a single sample.
__len__: returns the length of your
Here is a small example:
from PIL import Image
from torch.utils.data import Dataset, DataLoader
def __init__(self, gif_paths):
self.paths = gif_paths
def __getitem__(self, index):
x = Image.open(self.paths[index])
# Convert the gif here to frames, etc.
gif_paths = ['./a.gif', './b.gif']
dataset = MyDataset(gif_paths)
loader = DataLoader(
Thanks for your reply. I would like to utilize the existing functionality of the original ImageFolder dataset as much as I can so that I can easily read labels and apply torchvision.transforms later on.
I look at the source code of ImageFolder. The main problem is that there is a parameter called 'IMG_EXTENSIONS which lists the extensions wth which the file will be loaded by the default image loader. Any file with an extension not listed in this list will be ignored during loading.
IMG_EXTENSIONS = [
'.jpg', '.JPG', '.jpeg', '.JPEG',
'.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP',
Is there any way I can just reset this parameter and reserve the other functionality of the ‘ImageFolder’ class?
You could use
DatasetFolder and provide the extensions yourself.
Also, you might want to provide the
loader to load and process the
.GIF files, and return a sample.
Thanks for your suggestion. I implemented a customized DatasetFolder. However, I run into an issue reported here Iterate through customized DatasetFolder does not work
Do you have suggestion?