I think you need to use .view(1, 2048) on the output of the resnet. So it would look like this:
with torch.no_grad():
features = self.resnet(images)
features = features.view(1, 2048)
Before the fully connected layer, torch.flatten changes the dimensions to [2048, 1].