MNIST dataloader - return an extra iten for eg, Mean Pixel value of the Image

Hello, I would like to have the MNIST data loader (torchvision package) to return an extra item for eg, Mean pixel value of the image along with the Image and the target. I have created a subclass which override the get_item method and returns the extra item.

import torchvision.datasets as dataset
from torch.utils.data import  DataLoader

class MNIST1(dataset.MNIST):
    
    def __getitem__(self, index):
        """
        Args:
            index (int): Index

        Returns:
            tuple: (image, target,mean_pixel) where target is index of the target class.
        """
        img, target = super(MNIST1,self).__getitem__(index)

        # doing this so that it is consistent with all other datasets
        # to return a PIL Image
        img = Image.fromarray(img.numpy(), mode='L')
       # the exta item to be returned
        mean_pixel = PIL.ImageStat.Stat(img).mean
        if self.transform is not None:
            img = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)
        
        sample ={"image":img,"target":target,"mean_pixel",mean_pixel}
        return sample

the data loading part…

trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])
train_set = MNIST1(root=root, train=True, transform=trans, download=True)
test_set = MNIST1(root=root, train=False, transform=trans, download=True)

batch_size = 100
train_loader = DataLoader(dataset=train_set,
                 batch_size=batch_size,
                shuffle=True,pin_memory=True)
test_loader = DataLoader(dataset=test_set,
                 batch_size=batch_size,
                shuffle=True,pin_memory=True)

Am i doing it correct , I am facing errors while trying to iterate over the dataloader.

ValueError: Too many dimensions: 3 > 2.

this is from Image.fromarray(img.numpy(), mode=‘L’) part. Please suggest a solution

img will have 3 dimensions ([1, 28, 28]), which will yield this error.
The reason for the additional channel dimension is, that each sample will already be transformed in the super().__getitem__() method.
In particular Normalize will add the channel dimension, if you pass the MNIST images to this method.

The solution would be to override the complete __getitem__ method or to skip the additional transformation.

1 Like

Thanks for the clarification and it worked. I would like to examine the performance of an Auto encoder in MNIST dataset with an addition of extra loss function on mean pixel target. But I calculate the target mean pixels like this .

class MNIST1(dataset.MNIST):
    
    def __getitem__(self, index):
        """
        Args:
            index (int): Index

        Returns:
            tuple: (image, target,mean_pixel) where target is index of the target class.
        """
        img, target = self.data[index], int(self.targets[index])
        

        # doing this so that it is consistent with all other datasets
        # to return a PIL Image
        img = Image.fromarray(img.numpy(), mode='L')
        
        # Calculatine the mean pixel value
        stat = ImageStat.Stat(img)
        mean_pixel = stat.mean
        if self.transform is not None:
            img = self.transform(img)
            
        if self.target_transform is not None:
            target = self.target_transform(target)
        
        
        return img, target,mean_pixel

I have doubt regarding the mean calculation, I am doing it on the PIL image where the pixel value ranges between 0 and 255. However I normalize the input image as follows

trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])

If I want to calculate the loss on the mean pixel(assume that we have encoded the predicted mean pixel value on one of the entries of the encoder decoder bottleneck part) target mean pixel values should be calculated on the normalized input or not?