Error while resizing image via tranforms

Could someone tell me what’s wong here? I suspect that the resize functionality is not upscaling the images properly but I’m not 100%.

dataset = 'FashionMNIST'                                                                                                                                                                     
datapath = './data'                                                                                                                                                                          
ds = getattr(torchvision.datasets, dataset)                                                                                                                                                  
transforms = torchvision.transforms.Compose([                                                                                                                                                
            torchvision.transforms.Resize((32, 32)),                                                                                                                                                   
            torchvision.transforms.ToTensor(),                                                                                                                                               
 ])                                                                                                                                                                                           
train_set = ds(root=datapath, train=True, download=True, transform=transforms)                                                                                                               
train_set.data.unsqueeze_(1)                                                                                                                                                                 
train_set.data = train_set.data.repeat(1, 3, 1, 1)                                                                                                                                           
                                                                                                                                                                                             
dummy_data = torch.utils.data.DataLoader(train_set, batch_size=24, shuffle=True, num_workers=4, pin_memory=True)                                                                             
                                                                                                                                                                                             
x, y = next(iter(dummy_data))

I keep getting the following error

Original Traceback (most recent call last):
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 92, in __getitem__
    img = Image.fromarray(img.numpy(), mode='L')
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/PIL/Image.py", line 2661, in fromarray
    raise ValueError("Too many dimensions: %d > %d." % (ndim, ndmax))
ValueError: Too many dimensions: 3 > 2.

I think if you comment these two lines and then print train_set data size

# train_set.data.unsqueeze_(1)                                                                                                                                                                 
# train_set.data = train_set.data.repeat(1, 3, 1, 1)
print(train_set.data.size())

gives

torch.Size([60000, 28, 28])

then if you

print(x.shape)

this gives

torch.Size([24, 1, 32, 32])

fashion mnist dataset has 1 channel images of size 28x28, which we resized to 32x32.
Resize has no issue, it works as it should.
I think if you want 3x32x32, then you will have to pass these images through

c = nn.Conv2d(in_channels=1, out_channels=3, kernel_size=1)
c(x).shape

gives

torch.Size([24, 3, 32, 32])

Thanks a lot. I was hoping to avoid changing the model by adding an additional conv operation just to make the input sizes appropriate.

I tried having it in my transforms

transforms.Resize(32),
transforms.RandomHorizontalFlip(),                        
transforms.RandomCrop(32, padding=4),                     
transforms.ToTensor(),                                    
transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
transforms.Lambda(lambda x: nn.Conv2d(                    
in_channels=1, out_channels=3, kernel_size=1)(x)      
),

but produces broadcast errors

Traceback (most recent call last):
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 95, in __getitem__
    img = self.transform(img)
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 166, in __call__
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "/home/kirk/miniconda3/envs/torch/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 217, in normalize
    tensor.sub_(mean[:, None, None]).div_(std[:, None, None])
RuntimeError: output with shape [1, 32, 32] doesn't match the broadcast shape [3, 32, 32]

I seems that what I’m missing is the equivalent commands of

train_set.data.unsqueeze_(1)                                                                                                                                                                 
train_set.data = train_set.data.repeat(1, 3, 1, 1)  

but for PIL images so that I can hook them up in the transforms.

@kirk86 You should ideally provide the image in the order (H, W, C to PIL for resizing(and not as <C, H, W> as you have provided above, hence the error). You can later transpose the tensor to get the shape = (N, C, H, W).

P.S: If you need the input image to be of shape (3, 32, 32), you can use your own code
train_set.data = train_set.data.repeat(1, 3, 1, 1) to repeat the grayscale data along RGB channels(after the resize). I don’t think nn.Conv2d is the right way to do this.

These different representations of different programs are so fucking annoying and can cause unnecessary headaches.

I think both opencv and pillow takes images in this order(rows, cols) or (rows, cols, depth).

Can you give a MWE example of what you explained above in my use case. Everything that I’ve tried doesn’t seem to work. Especially because any transforms are only applied when you iterate over the loader.

I’ve also found that the resize in the transforms works only if you don’t have any normalization after it, otherwise it throws errors on dimensions.

You can write your own Dataset class. Basically use existing FashionMNIST with your own custom modifications

class CustomDataset(Dataset):
    def __init__(self, root_dir):
        self.train_set = FashionMNIST(root=root_dir, train=True, download=True)
        self.transforms = torchvision.transforms.Compose([                                                                                                                                                
            torchvision.transforms.Resize((32, 32)),                                                                                                                                                   
            torchvision.transforms.ToTensor(),                                                                                                                                               
 ])                                           
    def __getitem__(self, idx):
        # apply necessary transforms including resize first
        image = self.transforms(self.train_set.data[idx])
        
        # image of shape (1, 3, 32, 32)
        image = image.view((1, 1, 32, 32)).repeat(1, 3, 1, 1)
        label = self.train_set.targets[idx]
        
        # do other operations necessary
        return image, label

Thanks, the other only alternative that I came up with is to use the transforms as before without the normalization

transforms.Resize(32),
transforms.RandomHorizontalFlip(),                        
transforms.RandomCrop(32, padding=4),                     
transforms.ToTensor(),                                     
),

then when iterating over the loader checking channel dim and repeating accordingly

for x, y in loader:
  if x.shape[1] != 3:
     x = x.repeat(1, 3, 1, 1)

Yes, that can also be done!