Variable image dimensions inside consecutive batches

rynczakd · June 17, 2023, 7:14pm

Hi, I turn to you with a question about the size of the images inside the consecutive batches that are passing to the CNN input.

The problem is that I want to pass to the CNN batches of the following dimensions - (batch_size, C = 1, H = 224, W = (600 - 1400). For each of the batches I am padding with zeros the images to the maximum width in the batch - my images have a width of 600 to 1400 and a fixed height equal to 254. For example - the first batch has the dimension (64, 1, 254, 1200) and the next batch has the dimension (64, 1, 254, 1000) and so on. The padding is done inside Dataset class using collate_fn function - I am loading images, determining the maximum width in the batch and then padding rest of images to that maximum width.

Can batches with variable internal dimensions be fed into the CNN input? Or is it necessary to normalize the images to the maximum width in the dataset using padding?

I will add that I intend to use CNN only to generate feature maps, which will then be entered as input to the GRU.

ptrblck · June 17, 2023, 10:26pm

Yes, using variable input shapes would be possible e.g. if you are making sure to create the needed activation shape for layers requiring a static shape. E.g. often adaptive pooling layers are used with a predefined output size to create a statically shaped activation with is then fed to a linear layer, which uses a static in_features value.
Depending on your model and how these feature maps are used afterwards it could also work.

rynczakd · June 17, 2023, 10:45pm

Thank you for answer. I would like to combine CNN with GRU neural network and the whole data processing will be as follows:

Load images in collate_fn function basing on the paths returned by getitem method (I have decided to return images paths in getitem due to variable size images - getitem method should return fixed size batches, so returning paths to images seems to be good idea I think)
Padd images with zeros to maximum width in batch and create padding mask to obtain real images widths without padding later on (padding also in collate_fn function)
Apply augmentation using transforms
Return padded and augmented batch in collate_fn function.

Above steps are implemented in custom Dataset class. Then I would like to pass batches with padded images to CNN to generate feature maps and then pass the output of the CNN to the GRU network (seq-to-seq architecture). The CNN role is just to generate feature maps that can be processed later on using GRU as I mentioned, so I think that I do not have to take care about fixed size at the output of CNN.
When it comes about GRU network, I am going to calcualte dimensions of feature maps basing on widths obtained from padding masks - the main idea is to calculate dimensions of feature maps for dimensions without padding (just using equations for calculating output dimensions for conv/pooling layers) and pass the whole images with padding to GRU using pack_padded_sequence where calculated using equations widths will be passed as lengths argument to that function.

So in that case - can I use padding to maximum width in batch for CNN and pass batches with variable Width dimension (and fixed rest of dimensions)? Width will be fixed for single batch and will change for each batch. Does PyTorch CNN module handle that case?