I don't really understand the difference between kernel as weight and convolution kernel

Keiku · June 26, 2021, 3:07am

What does “kernel” in trained_kernel in the code below mean? Is it different from the kernel often expressed in convolution? Also, is the operation of torch.stack ([torch.mean (trained_kernel, 1)] * 6, dim = 1) a general operation? Is there a reference somewhere?

from torchvision.models import resnet34

# get resnet34 model with 6 channels
def get_model(pretrained=False):
    model = resnet34(pretrained=pretrained)
    model.fc = nn.Linear(512, n_class)
    trained_kernel = model.conv1.weight
    new_conv = nn.Conv2d(6, 64, kernel_size=7, stride=2, padding=3, bias=False)
    with torch.no_grad():
        new_conv.weight[:,:] = torch.stack([torch.mean(trained_kernel, 1)]*6, dim=1)
    model.conv1 = new_conv
    return model

eqy · June 26, 2021, 4:21am

I think in most cases “kernel” and “weight” are used interchangeably for convolutions. For example, “kernel size” usually specifies the spatial dimensions of the weight tensor.

As for the torch.stack ([torch.mean (trained_kernel, 1)] * 6, dim = 1) here, I don’t think that is very common, but it is a simple way to reinitialize the first layer for a model when it was trained for three channel input images and the new task has six channel input images. My guess would be that it might help prevent changing the downstream layers too much (compared to completely random reinitialization) during finetuning. Then again, I wonder if batchnorm might do something similar already.