How to use linear layer after AdaptiveAvgPool2d?

n0obcoder · June 3, 2019, 10:38am

when you load the model,
model = models.resnet18(pretrained=True)
and print it, you see AdaptiveAvgPool2d layer followed by a linear layer, at the end of the architecture.
I dont understand how is it working without flattening the output of the AdaptiveAvgPool2d layer.

how do i use a linear layer after the AdaptiveAvgPool2d layer?

Consider this example:

m = nn.AdaptiveAvgPool2d((5,7))
dense = nn.Linear(5, 10)

input = torch.randn(1, 64, 8, 9)

output = m2(input) # its shape should be (1, 64, 5, 7)
output2 = dense(output) # this line throws an error.

Please help me with this. Thanks in advance !

ptrblck · June 3, 2019, 10:52am

The activation is reshaped in the forward method of the resnet implementation as can be seen here.

n0obcoder · June 3, 2019, 10:59am

it answered my question. thanks a lot. But what the point of reducing the feature map to (1,1) as can be seen in the line

AdaptiveAvgPool2d(output_size=(1, 1))

Wouldn’t it be better to have a bigger output_size?

ptrblck · June 3, 2019, 11:02am

Maybe it might be better for some use cases. However, the current implementation just sticks to the original ResNet paper.

n0obcoder · June 3, 2019, 11:15am

my siamese network’s training loss seems to be fluctuating a lot. Do you mind having a look at the code…

my implementation of siamese net using the triplet loss fucntion

ptrblck · June 3, 2019, 11:25am

The link is unfortunately dead (404).

n0obcoder · June 3, 2019, 11:32am

try this

ptrblck · June 3, 2019, 11:40am

The code looks generally fine. Is your loss decreasing at all or just randomly jumping?

n0obcoder · June 3, 2019, 11:42am

seems like jumping to me

This is how the training loss looks like,

Epoch 1/5
loss @ batch#000: 0.8270422220230103
loss @ batch#001: 0.7125710844993591
loss @ batch#002: 0.660284698009491
loss @ batch#003: 0.838929295539856
loss @ batch#004: 1.083276391029358
loss @ batch#005: 0.6964817047119141
loss @ batch#006: 0.4421113133430481
loss @ batch#007: 0.8267079591751099
loss @ batch#008: 0.5320273637771606
loss @ batch#009: 0.14518025517463684
loss @ batch#010: 0.39652758836746216
loss @ batch#011: 0.5080978870391846
loss @ batch#012: 0.0
Epoch 2/5
loss @ batch#000: 0.03382734954357147
loss @ batch#001: 0.21790757775306702
loss @ batch#002: 0.354198694229126
loss @ batch#003: 0.13982316851615906
loss @ batch#004: 0.5416660308837891
loss @ batch#005: 0.0642566978931427
loss @ batch#006: 0.1361594796180725
loss @ batch#007: 0.2777060270309448
loss @ batch#008: 0.1839388906955719
loss @ batch#009: 0.0
loss @ batch#010: 0.006485641002655029
loss @ batch#011: 0.0
loss @ batch#012: 0.08765590190887451
Epoch 3/5
loss @ batch#000: 0.12175476551055908
loss @ batch#001: 0.0
loss @ batch#002: 0.40359675884246826
loss @ batch#003: 0.03469619154930115
loss @ batch#004: 0.3805411159992218
loss @ batch#005: 0.07248783111572266
loss @ batch#006: 0.07092243432998657
loss @ batch#007: 0.438197523355484

ptrblck · June 3, 2019, 11:45am

Try to play around with some hyperparameters (e.g. learning rate, other optimizer) and see if the behavior improves. Also, it might be helpful to use a small subset of your data (e.g. just 10 pairs) and try to overfit your model on it. If your model cannot overfit this small data sample, you might have some bugs in your code or the architecture is just not suitable for the use case.