Learning with different sizes

I wanted to create an ensemble learning model, where I retrain the same custom model over the different size of the image. Is it possible?

If your model is flexible regarding the input shapes, it would be possible.
E.g. for a CNN you could use an adaptive pooling layer before flattening the activation and passing it to the linear layer to accept more flexible spatial shapes.

I’m not sure, if I understood your use case correctly, so please feel free to add more information. :slight_smile:

1 Like

Thanks for the quick reply, actually i am a keras user trying to switch to pytorch.
I was using a keras model using Xception pretrained model

base_model0 = Xception(weights='imagenet',include_top=False,input_shape=(None,None,3))
base_model1 = Xception(weights='imagenet',include_top=False,input_shape=(None,None,3))
base_model2 = Xception(weights='imagenet',include_top=False,input_shape=(None,None,3))
base_model3 = Xception(weights='imagenet',include_top=False,input_shape=(None,None,3))
x0 = base_model0.output
x0 = layers.GlobalAveragePooling2D()(x0)
x1 = base_model1.output
x1 = layers.GlobalAveragePooling2D()(x1)
x2 = base_model2.output
x2 = layers.GlobalAveragePooling2D()(x2)
x3 = base_model3.output
x3 = layers.GlobalAveragePooling2D()(x3)
x = layers.concatenate([x0,x1,x2,x3])
x = layers.Dense(4,activation='sigmoid')(x)
model = Model(inputs=(base_model0.input, base_model1.input, base_model2.input, 
    base_model3.input), outputs=x)

each Xception base model has (NONE, NONE,3) shape signifying any shape of images be inserted,
where NONE, NONE signifies height and width and ‘3’ the no. of channels.
then each result of the layers go through Global average pooling and are concatenated to give an output.
Although the input shapes were different but when it enters the Xception the output shape is same before going to the Global average pooling .
I just wanted to know if I could implement with Pytorch.

I don’t know, how the Xception model is implemented in Keras or what GlobalAveragePooling2D does (is it returning a single output scalar for each sample?).
However, if you are using adaptive pooling layers (which define the output size instead of the kernel size), your spatial input shapes would be flexible.
This line shows the usage for the ResNet implementation.

Since PyTorch does not use any placeholder variables, you don’t have to set None for specific shapes.

1 Like

Thanks for the clarification.
Are average pooling(in the example of your given link) and adaptive pooling the same?
Is there some example codes or links where adaptive layer was first used to make it to desirable shape and then pass it in the model.

Average and adaptive pooling can be combined and these names do not refer to the same layer.
In the linked example, an nn.AdaptiveAvgPool2d layer is used. The docs give you an example of how to use it, but let me know, if you get stuck.

Hi again, As you said I have to do adaptive pooling before sending it to the network.
So for example as VGG16 require input images to be of size 224X224X3, but we want to train on image on different size ( example 1024X1024X3)
Should the code look like this?

model = models.vgg16(pretrained=False)
first_layer = [nn.AdaptiveMaxPool2d((224,224))]
model.features= nn.Sequential(*first_layer )

Also now I have to freeze my model from 2nd layer, and I only need to train the 1st layer and the classification part, how to do that?

No, you don’t need to add the adaptive pooling layer at the beginning of the model.
vgg16 uses this layer internally before feeding the activation to the linear layer in this line of code.

You can just use different shapes:

model = models.vgg16()

x = torch.randn(1, 3, 224, 224)
out = model(x)

x = torch.randn(1, 3, 1024, 1024)
out = model(x)

That wouldn’t be necessary, as pooling layers don’t have trainable parameters.

1 Like