Hi. I am struggling to remove the softmax of pretrained resnet18. I am trying to send the CNN output to a transformer encoder-decoder network. The encoder-decoder network expects an input tensor of size [2000]. A similar thread is here.
import torchvision.models as models
class ResnetEncoder(nn.Module):
def __init__(self):
super(ResnetEncoder, self).__init__()
resnet = models.resnet18(pretrained=True)
modules = list(resnet.children())[::-1]
self.resnet = nn.Sequential(*modules)
def forward(self, images):
out = self.resnet(images) # dimension: (batchsize * n frame, 3, 227, 227)
out = out.view(-1, 2000)
return out
model = ResnetEncoder()
input_frames = torch.randn(100, 3, 224, 224)
output = model(x)
RuntimeError: size mismatch, m1: [168000 x 224], m2: [512 x 1000] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41
There is a size mismatch because the first layer of resnet expects an input of size Z x 512.
ResnetEncoder(
(resnet): Sequential(
(0): Linear(in_features=512, out_features=1000, bias=True)
(1): AdaptiveAvgPool2d(output_size=(1, 1))