Concatenating images

I don’t know if EfficientNet implementations use normalized/standardized inputs and if so, what the reason would be.
Did you check some reference implementations (maybe it’s mentioned in the code) or the paper?

1 Like

thank you.
excuse me. Another question.
Is it possible to add a convolution layer in transfer learning?
When using transfer learning, I added a torch.nn.Conv2d layer in the last layer, but I got this error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [512, 1280, 1, 1], but got 2-dimensional input of size [8, 1280] instead

How can I fix this error when I do not have access to self.forward?

Thank you so much

Usually you would create a model object (e.g. via model = MyModel()) and would thus need access to the source code of the model.
However, if you cannot access it for some reason, you could add an nn.Unflatten layer in front of the new conv layer so that the inputs are 4-dimensional again.

1 Like

Thank you so much :pray: :pray:

Hello sir. Good time
excuse me.
Does using nn.AdaptiveAvgPool2d((1,1)) make sense?
In this case, is the filter size equal to the input size?
Why is this used?
What happens in this case?

Adaptive pooling layers can be used to create a defined output shape, which could allow your model to work with variable input shapes. E.g torchvision models use adaptive pooling layers after the feature extractor and before feeding the activation to the first linear layer to allow different input shapes.

1 Like

Thank you very much
What does output_size= (1,1) mean?

And is it possible to use the pooling layer between fully connected layers?

I used the classifier layer as below.

Linear(1280,512), unFlatten(), AdaptiveAvgPool2d((1,1)), Flatten(), Dropout(), Linear(512,256), unFlatten() , AdaptiveAvgPool2d((1,1)), Flatten(), Dropout(), Linear(256,6))

Is having an Pooling layer different with not having it here?

The output_size defines the spatial size of the output activation of this layer as seen here:

pool = nn.AdaptiveAvgPool2d(output_size=(1,1))

x = torch.randn(2, 3, 24, 24)
out = pool(x)
print(out.shape)
> torch.Size([2, 3, 1, 1])

x = torch.randn(2, 6, 2, 2)
out = pool(x)
print(out.shape)
> torch.Size([2, 6, 1, 1])

You can thus pass tensors with different input shapes to this layer and will get the defined spatial output size.

Yes, that’s possible.

Assuming the first linear layer creates a 2D activation in the shape [batch_size, 512], the Unflatten and AdaptiveAvgPool2d layer won’t do anything, since the spatial shape would already be 1x1 as seen here:

model = nn.Sequential(
    nn.AdaptiveAvgPool2d((1,1)),
    nn.Flatten(), 
    nn.Linear(512, 256)
)

x = torch.randn(2, 512, 1, 1)
out1 = model(x)
out2 = model[2](x.view(x.size(0), -1))
print((out1 - out2).abs().max())
>> tensor(0., grad_fn=<MaxBackward1>)
1 Like

Thank you very very much.God reward you.

In the EfficientNet model, the final convolution layer is as follows.

(_conv_head): Conv2dStaticSamePadding(
    320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False
    (static_padding): Identity()
  )

Is there anything like that on torch.nn? Can torch.nn.Conv2d be used instead?
I just want to change 1280!

I don’t know exactly what Conv2dStaticSamePadding does, but based on this comment it seems to be used to export the model so I guess you should be able to replace it with an nn.Conv2d layer.

1 Like

Thank you so much :pray: :pray: