Size mismatch error when using pre-calculated pretrainied model

I’m trying to pre-calculate the convolution layers of a pre-trained model prior to training the fully-connected classifier layer, specifically densenet161 model. The goal is to gain some speed in training the model if it doesn’t have to consistently calculate the conv layers which I have frozen.

I’m getting size mismatch error when I pass the calculated values into my fully-connected layer, a single linear function with in_features = 2208 (output of conv layer) and out_layer=5005. I don’t think there’s an issue with my training function, I think I’m not passing the calculated data in the right dimensions to the fully-connected layer with batches of 7.

The size of the output is [7, 2208, 7, 7]. So it looks like it’s right…what am misunderstanding? Thanks in advance!

Here’s the code for calculating the frozen layers

# Function to generate convoluted features and labels for given DataLoader and model
def preconvfeat(data_loader, model):
    print('[preconvfeat]')
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    conv_features = []
    labels_list = []
    count = 0
    for data in data_loader:
        count += 1
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        output = model.features(inputs)  # calculate values for features block only
        conv_features.extend(output.data.cpu().numpy())
        labels_list.extend(labels.data.cpu().numpy())
    conv_features = np.concatenate([[feat] for feat in conv_features])

    return (conv_features, labels_list)

If your features have a shape of [batch_size, 2208, 7, 7], the general approach would be to flatten this tensor to [batch_size, 2208*7*7] or to use some kind of pooling to get a tensor of [batch_size, 2208, 1, 1] and just squeeze dim2 and dim3 before passing it to the linear layer.
In the former case, you would have to define in_features=2208*7*7 in your linear layer.

@ptrblck thanks, that’s what I was leaning towards as solution but thought it odd that I couldn’t directly consume the output of the pre-calculated denselayers and was concerned changing to in_features=2208*7*7 will lead to incorrect model since the default pretrained model used in_features=2208. I’m actually surprised I didn’t get a tensor with [batch_size, 2208, 1, 1], shouldn’t the output from the denselayer feed in a flattened features without me having to manually implement pooling afterwards?

You are right. In the original model adaptive pooling is used in the forward method.
Have a look at this line of code.
Did you add it to your model as well?

1 Like

Thanks - I was digging around that code and completely missed.

1 Like