I don’t know if EfficientNet implementations use normalized/standardized inputs and if so, what the reason would be.

Did you check some reference implementations (maybe it’s mentioned in the code) or the paper?

thank you.

excuse me. Another question.

Is it possible to add a convolution layer in transfer learning?

When using transfer learning, I added a `torch.nn.Conv2d`

layer in the last layer, but I got this error:

`RuntimeError: Expected 4-dimensional input for 4-dimensional weight [512, 1280, 1, 1], but got 2-dimensional input of size [8, 1280] instead`

How can I fix this error when I do not have access to `self.forward`

?

Thank you so much

Usually you would create a model object (e.g. via `model = MyModel()`

) and would thus need access to the source code of the model.

However, if you cannot access it for some reason, you could add an `nn.Unflatten`

layer in front of the new conv layer so that the inputs are 4-dimensional again.

Thank you so much

Hello sir. Good time

excuse me.

Does using `nn.AdaptiveAvgPool2d((1,1))`

make sense?

In this case, is the filter size equal to the input size?

Why is this used?

What happens in this case?

Adaptive pooling layers can be used to create a defined output shape, which could allow your model to work with variable input shapes. E.g `torchvision`

models use adaptive pooling layers after the feature extractor and before feeding the activation to the first linear layer to allow different input shapes.

Thank you very much

What does output_size= (1,1) mean?

And is it possible to use the pooling layer between fully connected layers?

I used the classifier layer as below.

`Linear(1280,512), unFlatten(), AdaptiveAvgPool2d((1,1)), Flatten(), Dropout(), Linear(512,256), unFlatten() , AdaptiveAvgPool2d((1,1)), Flatten(), Dropout(), Linear(256,6))`

Is having an Pooling layer different with not having it here?

The `output_size`

defines the spatial size of the output activation of this layer as seen here:

```
pool = nn.AdaptiveAvgPool2d(output_size=(1,1))
x = torch.randn(2, 3, 24, 24)
out = pool(x)
print(out.shape)
> torch.Size([2, 3, 1, 1])
x = torch.randn(2, 6, 2, 2)
out = pool(x)
print(out.shape)
> torch.Size([2, 6, 1, 1])
```

You can thus pass tensors with different input shapes to this layer and will get the defined spatial output size.

Yes, that’s possible.

Assuming the first linear layer creates a 2D activation in the shape `[batch_size, 512]`

, the `Unflatten`

and `AdaptiveAvgPool2d`

layer won’t do anything, since the spatial shape would already be `1x1`

as seen here:

```
model = nn.Sequential(
nn.AdaptiveAvgPool2d((1,1)),
nn.Flatten(),
nn.Linear(512, 256)
)
x = torch.randn(2, 512, 1, 1)
out1 = model(x)
out2 = model[2](x.view(x.size(0), -1))
print((out1 - out2).abs().max())
>> tensor(0., grad_fn=<MaxBackward1>)
```

Thank you very very much.God reward you.

In the EfficientNet model, the final convolution layer is as follows.

```
(_conv_head): Conv2dStaticSamePadding(
320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False
(static_padding): Identity()
)
```

Is there anything like that on `torch.nn`

? Can` torch.nn.Conv2d`

be used instead?

I just want to change 1280!

I don’t know exactly what `Conv2dStaticSamePadding`

does, but based on this comment it seems to be used to export the model so I guess you should be able to replace it with an `nn.Conv2d`

layer.

Thank you so much

Thank you for your explanation. To illustrate your point, here is a command to train resnet50 from torchvision:

```
torchrun --nproc_per_node=8 train.py --model resnet50 --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4
```

From my understanding, the argument `--train-crop-size 176`

decides the size of the input image during training. As it’s different from the default size of the image in Imagenet (224x224), an Adaptive pooling layer is essential in this case.

Printing the resnet50 of torchvision gives:

```
# previous layers
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=2048, out_features=1000, bias=True)
)
```