Custom a new convolution layer in cnn

zahra · April 26, 2019, 9:46am

Hi,

I am a beginner in pytorch. I want to define my proposed kernel and add it to a CNN. I am searching about 2 or 3 days. I am so confused! Because I do not know, I should implement CNN by C++ from scratch and build it and add it to pytorch or it is enough to implement a new convolution layer by my own kernel and add it to existing CNN in pytorch?!
I think the second solution is correct. If it is, I appreciate it if you guide me how do this 2 steps:

1. Making a c++ convolution layer or python method
2. Add this built new layer to a CNN in pytorch

Many thanks before all,
Zahra

ptrblck · April 26, 2019, 10:33am

If you would like to define a custom kernel, you could just set the weight attribute to it.
Here is a small example:

class MyModel(nn.Module):
    def __init__(self, kernel):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, 3, 1, 1, bias=False)
        # Add other layers here
        
        # Initialize conv1 with custom kernel
        self.conv1.weight = nn.Parameter(kernel)
        
    def forward(self, x):
        x = self.conv1(x)
        # pass x to other modules
        return x        

kernel = torch.randn(1, 1, 3, 3)  # Define your custom kernel
model = MyModel(kernel)
x = torch.randn(1, 1, 24, 24)
output = model(x)

zahra · April 26, 2019, 10:38am

Thanks for your reply.
No, I mean when convolution is occurred between kernel and input matrix, by some operations like plus and multiply an element of feature map for the next layer is generated. I want to replace these operations by my own. It means that I should replace for example conv1D by my generated layer(by its functionality). Unfortunately I do not know how.
I would appreciate it if you guide me.

Best Regards,
Zahra

ptrblck · April 26, 2019, 10:43am

If you would like to create a new layer, you could

create a new nn.Module writing your forward method, if you stick to PyTorch methods
create a torch.autograd.Function with a custom forward and backward method, if you need to leave PyTorch or would like to implement it manually

Have a look at the Extending PyTorch docs.

zahra · April 26, 2019, 10:48am

Thanks for your reply.
Yes, I studied this page and had such implementation but as you see, it is a linear function. I can not find back end implementation of convolution layer to replace its operation with my operation. Do you have any recommendation about it?

I mean, I want to use cnn in pytorch, just replace conv1D with convZ(conv generated by Zahra) in which I replace + and * by - and /
Best Regards

kshitij · April 26, 2019, 12:23pm

Dont know if this would be much help but you may probably be able to find it here

Read the documentation to build from source once you have made your changes.

zahra · April 26, 2019, 12:27pm

Thanks for your reply.
You mean that I should make changes on pytorch source codes and execute all files and install new generated pytorch? terrific, does not it?

Best Regards

ptrblck · April 26, 2019, 12:33pm

Before changing the code base I would try to write an nn.Module or autograd.Function to see, if the method works as expected.
Even though it will most likely be slower, you could quickly implement it and run a few tests.

One issue I expect is that if you just swap the addition to a subtraction and the multiplication to a division, the new bias could just be the negation of the old one and the weights the reciprocal.
Anyway, I’m not familiar with your use case, but I would still recommend to try your idea first on the Python side.

zahra · April 27, 2019, 4:49pm

I write a simple program to customize my convolution layer as bellow:

import torch

class convZ(torch.autograd.Function):

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias is an optional argument
    def forward(ctx, input, weight, bias=None):

        ctx.save_for_backward(input, weight, bias)
        output = torch.empty(len(input[1]) - 1)

        start_col_indx = 0
        end_col_indx = 2

        out_row_inx = 0
        out_col_indx = 0

        for i in range(len(input[1]) - 1):
            # print(torch.mm(input[:, start_col_indx:end_col_indx], weight))
            conv_mul = input[:, start_col_indx:end_col_indx] * weight
            start_col_indx += 1
            end_col_indx += 1
            conv_sum = torch.sum(conv_mul)
            output[out_col_indx] = conv_sum
            out_col_indx += 1

        # print(output)


        # print(weight.t())
        # output = input.mm(weight.t())
        # output = input + weight
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias


# 2. Define a Convolutional Neural Network
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.convz = convZ.apply
        self.pool = nn.AvgPool1d(2)

    def forward(self, x, w):
        x = self.pool(F.relu(self.convz(x, w)))
        return x

import torch.optim as optim
if __name__ == '__main__':

    input1 = torch.Tensor([[1, 0, 0], [0, 1, 0]])
    weight1 = torch.randn(2, 2, device='cpu', dtype=torch.float, requires_grad=True)
    net = Net()

    # 3. Define a Loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

but I receive this error:

ValueError: optimizer got an empty parameter list

What’s wrong with my code?

Many thanks before all

ptrblck · April 27, 2019, 5:16pm

Try to define a custom module deriving from nn.Module and create the weight parameter as nn.Parameter. Alternatively if you would like to stick to the functional approach, just define weight = nn.Parameter(torch.randn(2, 2)) and pass [weight] to your optimizer.

zahra · April 27, 2019, 6:37pm

Thanks a lot but there are 2 problems:

weight = nn.Parameter(torch.randn(2, 2))
optimizer = optim.SGD(weight, lr=0.001, momentum=0.9)

I received an error as “TypeError: params argument given to the optimizer should be an iterable of Tensors or dicts, but got torch.FloatTensor”
If I want to have a next layer (for example a linear layer in standard pytorch) after my custom layer, we have 2 sets of parameters: parameters related to next layer and parameters related to my custom layer. How could I merge them together and pass them to optimizer.

Many thanks

ptrblck · April 27, 2019, 7:50pm

Pass weight as a list:
optimizer = optim.SGD([weight], lr=0.001)
Add all parameters to a list or use the module approach:


class MyConvZ(nn.Module):
    def __init__(self):
        super(MyConvZ, self).__init__()
        self.fn = convZ.apply
        self.weight = nn.Parameter(torch.randn(1, 1, 2, 2))
        
    def forward(self, x):
        x = self.fn(x, self.weight)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.convz = MyConvZ()
        self.pool = nn.AvgPool1d(2)

    def forward(self, x):
        x = self.pool(F.relu(self.convz(x)))
        return x

optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

However, the shapes do not match for an image tensor as the input.
The input is expected to have the shape [batch_size, channels, height, width], while the conv weights should be [nb_kernel, in_channels, height, width].

zahra · April 28, 2019, 6:44pm

I do not know how to appreciate you. I was really frustrated before you help me.
I would appreciate it if you help me about this 2 new problems:

This is my code (input and its label is just a toy example):

# Besmelahel vahedel ghahar
# Khodaya be omide to

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class convZ(torch.autograd.Function):

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias is an optional argument
    def forward(ctx, input, weight, bias=None):
        ctx.save_for_backward(input, weight, bias)
        output = torch.empty(len(input[1]) - 1)

        start_col_indx = 0
        end_col_indx = 2

        out_row_inx = 0
        out_col_indx = 0

        for i in range(len(input[1]) - 1):
            # print(torch.mm(input[:, start_col_indx:end_col_indx], weight))
            conv_mul = input[:, start_col_indx:end_col_indx] * weight
            start_col_indx += 1
            end_col_indx += 1
            conv_sum = torch.sum(conv_mul)
            output[out_col_indx] = conv_sum
            out_col_indx += 1

        # print(output)


        # print(weight.t())
        # output = input.mm(weight.t())
        # output = input + weight
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias


class MyConvZ(nn.Module):
    def __init__(self):
        super(MyConvZ, self).__init__()
        self.fn = convZ.apply
        # weight tensor = out_channels× in_channels/groups ×kH×kW
        self.weight = nn.Parameter(torch.randn(1, 1, 2, 2)) # when groups=1

    def forward(self, x):
        x = self.fn(x, self.weight)
        return x


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.convZ = MyConvZ()
        self.pool = nn.AvgPool1d(2)

    def forward(self, x):
        x = F.relu(self.convZ(x))
        temp2 = []
        for i in range(len(x)):
            temp2.append(x[i])
        temp3 = [[temp2]]
        temp4 = torch.Tensor(temp3)
        x = self.pool(temp4)
        return x


if __name__ == '__main__':

    # input tensor = minibatch×in_channels×iH×iW
    input1 = torch.Tensor([[1, 0, 0, 1], [0, 1, 0, 1]]) #(1, 1, 2, 4)
    label1 = torch.LongTensor([0.])

    net = Net()

    # 3. Define a Loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(input1)
    temp1 = []
    temp1.append(outputs[0])
    temp2 = torch.Tensor([temp1])
    loss = criterion(temp2, label1)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss = 0.0
    running_loss += loss.item()


print('Finished Training')

Since you and I make a custom layer, parameter of this layer customized, too, like. self.weight = nn.Parameter(torch.randn(1, 1, 2, 2))
If we have a next layer from standard pytorch after this layer, parameters would be considered automatically in addition to our customized parameter?
After running this program, I received this error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Many thanks before all

ptrblck · April 28, 2019, 9:10pm

There seem to be a few issues in your code.

I’m not completely sure, how your custom convolution works, but usually an input to a conv layer should have the shape [batch_size, channels, height, width]. Currently your input has only two dimensions, so you might need to unsqueeze it and probably adapt the for loop in your custom conv layer to work on dim2 or dim3.

You can’t properly backprobagate, since you are detaching the computation graph by wrapping the output in a new tensor.
Just try to pass outputs and targets to your loss function.
However, in the current code you’ll get a size mismatch error.
nn.CrossentropyLoss expects and input of [batch_size, nb_classes] as the model output and [batch_size] as the target tensor containing class indices in the vanilla use case as described in the docs.
You can also pass multi-dimensional model outputs targets with additional dimension to this loss function, e.g. for pixel-wise classification.
Based on your current target shape, it seems that your model output should be the vanilla case: [batch_size, nb_classes].
This is usually done using a linear layer as the output layer.

zahra · April 29, 2019, 6:38pm

Dear ptrblck
I hope even if you touch soil, it will be changed to gold(It is a proverb in my language and it means I hope the best things for you). I never forget your help.

I adapt every changes that you told me.
I would appreciate it if you check my code, whether there is any problem or not.

# Besmelahel vahedel ghahar
# Khodaya be omide to

# This code is generated by ptrblck and Zahra Pourbahman
# https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class convZ(torch.autograd.Function):

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias is an optional argument
    def forward(ctx, input, weight, bias=None):
        ctx.save_for_backward(input, weight, bias)

        '''Output: (N, C_out, H_out, W_out) where N = batch size = 2, c_out = 1, H=height, W=width
        H_out = (H_in+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1)/stride[0] +1
        W_out = (W_in+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1)/stride[1] +1
        stride = 1
        padding = 0
        dilation = 0
        h_in = 1
        '''
        batch_size = len(input)
        c_out = 1 #len(input[0][0])-1
        h_out = 1
        w_out = len(input[0][0][0]) -1
        output = torch.empty(batch_size, c_out, h_out, w_out)

        start_col_indx = 0
        end_col_indx = 2

        for j in range(len(input)): # batch size
            out_col_indx = 0
            for i in range(len(input[j][0][0]) - 1): # nb of cols in each sample data
                conv_mul = input[j][0][:, start_col_indx:end_col_indx] * weight
                start_col_indx += 1
                end_col_indx += 1
                conv_sum = torch.sum(conv_mul)
                output[j][0][0][out_col_indx] = conv_sum
                out_col_indx +=1

        # output = input.mm(weight.t())
        # output = input + weight
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        # if ctx.needs_input_grad[0]:
        #     grad_input = grad_output.mm(weight)
        # if ctx.needs_input_grad[1]:
        #     grad_weight = grad_output.t().mm(input)
        # if bias is not None and ctx.needs_input_grad[2]:
        #     grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias


class MyConvZ(nn.Module):
    def __init__(self):
        super(MyConvZ, self).__init__()
        self.fn = convZ.apply
        # weight tensor = out_channels× in_channels/groups ×kH×kW
        self.weight = nn.Parameter(torch.randn(1, 1, 2, 2)) # when groups=1

    def forward(self, x):
        x = self.fn(x, self.weight)
        '''How to initialize weight with arbitrary tensor:
            https://discuss.pytorch.org/t/how-to-initialize-weight-with-arbitrary-tensor/3432'''
        return x


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.convZ = MyConvZ()
        self.pool = nn.AvgPool2d((1, 3))
        self.fc1 = nn.Linear(1 * 1 * 1, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.convZ(x)))
        x = x.view(-1, 1 * 1 * 1)
        x = self.fc1(x)
        return x


if __name__ == '__main__':


    ''' Conv2D when we have just 1 channel(1 input matrix) and the size of the kernel = height of input matrix
    a 2D convolution where the kernel height is equal to the input height:
    batch_size = 2 => nb_input in each batch = 2
    channels = 1 => in RGB photo, each photo includes 3 matrix, so nb_channel = 3, here each input is just a matrix
    height = 2 => nb_row in each input
    width = 4 => nb_col in each input
    kernel_size = (height, 2)
    (N,C_in,H,W) N is a batch size, C_in denotes a number of channels, H is a height of input planes in pixels, and W is width in pixels.
    
    # an example is in "https://discuss.pytorch.org/t/2d-input-with-1d-convolution/20331/2" for conv2d standard for 1 input matrix    
    '''

    input1 = torch.Tensor([
                            [[[1, 0, 0, 1],
                             [0, 1, 0, 1]]],

                            [[[1, 0, 0, 1],
                             [0, 1, 0, 1]]]
                          ])

    label1 = torch.LongTensor([0., 1.])

    net = Net()

    # Define a Loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(input1)
    loss = criterion(outputs, label1)
    loss.backward()
    optimizer.step()

    # print statistics
    running_loss = 0.0
    running_loss += loss.item()

    print(running_loss)

print('Finished Training')

# Test the network on the test data
test = torch.Tensor([
                            [[[1, 1, 1, 1],
                             [0, 1, 0, 1]]],

                            [[[0, 0, 0, 0],
                             [0, 1, 0, 1]]]
                          ])

classes = [0, 1]
test_output = net(test)
print(test_output)
_, predicted = torch.max(test_output, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(2)))

print('Finished Testing')

I am looking forward to get any excellent recommendation from you…
Also, how to be hopeful that I could receive your answers in any new topics, maybe, I will send next times?

Many thanks before all

ptrblck · April 29, 2019, 8:40pm

There seem to be some issues regarding the shape in the forward method.
Currently, input[j][0][:, start_col_indx:end_col_indx] will have the shapes:

torch.Size([2, 2])
torch.Size([2, 1])
torch.Size([2, 0])

which will create an error.

Did you forget to increase the end_col_index?

Also, I might have misunderstood your function. If you would only want to multiply elements of the shape [batch_size, 2] elementwise, your weight parameter might contain only two elements.
Also, the backward method is returning None, which also seems to be wrong. Maybe you would want to comment the calculations back in?

Besides that the general code looks good.

zahra · May 1, 2019, 7:42pm

Dear ptrblck

I appreciate for your helps. I know the value of your helps.

Many thanks

AG1991 · December 6, 2019, 2:10pm

Dear Zahra, could you please share your final code with us? I am currently working on the same problem. thanks in advance

zahra · December 6, 2019, 2:57pm

Dear AG1991,

Of course I would be pleasure to help but as I mentioned in “Custom nn.Conv2d”, backward pass had some complexity, so, I changed the solution based on what I expected from my code.

I mean that I did not customize convolution layer. I used conv2d function generated in Pytorch. If you want to customize convolution layer, please refer to the mentioned post(“Custom nn.Conv2d” ).

Good Luck

dpappas · December 13, 2022, 2:59pm

It seems like in the above code your loop selects a matrix of size 0.

if you add above the following line

input[j][0][:, start_col_indx:end_col_indx]

these lines

print((j, 0, start_col_indx, end_col_indx))
print(input[j][0][:, start_col_indx:end_col_indx].size())

You would see it.
It seems like you are missing padding for your convolution.