Optimizing parameters of function generating convolution kernel instead of raw weights

I am currently working on my thesis and have a situation where I have a function that generates a convolution kernel. For now, let us assume that this function takes the location in the kernel and multiplies it by a constant. As an example, consider f(x, y) = c(x + y). Rather than optimize and alter the raw weights of the kernel, I would instead like to make autograd update c during the backprop.

I have tried creating a custom ‘DumbKernel’ as follows:

import torch
import torch.nn as nn
from torch import Tensor
import numpy as np

class DumbKernel(nn.Conv2d):

    def __init__(self, *args, **kwargs):

        super(DumbKernel, self).__init__(*args, **kwargs)

        # Initialize parameter 'c' for each kernel
        self.parameters = []
        for out_channel in range(self.out_channels):
            sub_list = []
            for in_channel in range(self.in_channels):
                sub_list.append([
                    nn.Parameter(torch.tensor([1.0], requires_grad=True)),
                ])
                self.register_parameter(name='{}{}c'.format(out_channel, in_channel), param=sub_list[-1][0])
            self.parameters.append(sub_list)
        self.parameters = np.array(self.parameters, copy=False)

        # Initialize the filters based on parameters 'c'
        self.initialize_filters()

    def initialize_filters(self):
        for out_channel in range(self.out_channels):
            for in_channel in range(self.in_channels):
                self.weight[out_channel, in_channel] = self.generate_filter(*self.parameters[out_channel, in_channel])

    def generate_filter(self, c):

        #Obtain the right kernel size
        kernel_size = self.kernel_size[0]

        # Calculate the boundaries to add to the center element
        x_max = kernel_size // 2
        y_max = kernel_size // 2
        x_min = -x_max
        y_min = -y_max

        kernel = torch.ones((kernel_size, kernel_size))

        for y in range(y_min, y_max + 1):
            for x in range(x_min, x_max + 1):
                kernel[y_max - y][x_max - y] = c * (y + x)
        
        return kernel

    def forward(self, input: Tensor) -> Tensor:
        return super()._conv_forward(input, self.weight)
        

I hoped that autograd would pick up on the fact that the weights are in fact generated by a parameter ‘c’, through the computation graph, but this is not true. When I used the above code in a training loop, I run into the following error: ValueError: can't optimize a non-leaf Tensor. If I add self.weight = nn.Parameter(self.weight) at the end of the constructor, it does work but it then changes the raw kernel weights instead of ‘c’. Is there any way to fix this? In theory it should be possible using the chain rule!

Kind regards,
Gerrit

You should probably use nn.functional.conv2d, with it you can use any tensor as kernel .

1 Like

Thank you, this was it!