Custom convolution layer

Hello,

I would like to implement my own convolution layer in PyTorch - just for practice. I want to do that with some limitations:

  1. I don’t want to use bias(maybe later I will add it)
  2. All operations should be based and calculated on single vector from image(sliding windows). For example for kernel size 3x3 that vector should have size equal to 9.

Here is my code(based on another topics):

class MyConv2d(nn.Module):
    def __init__(self, n_channels, out_channels, kernel_size, dilation=1, padding=0, stride=1):
        super(MyConv2d, self).__init__()

        self.kernel_size = (kernel_size, kernel_size)
        self.kernal_size_number = kernel_size * kernel_size
        self.out_channels = out_channels
        self.dilation = (dilation, dilation)
        self.padding = (padding, padding)
        self.stride = (stride, stride)
        self.n_channels = n_channels
        self.conv = Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number))

    def forward(self, x):
        width = self.calculateNewWidth(x)
        height = self.calculateNewHeight(x)
        result = torch.zeros(
            [x.shape[0] * self.out_channels, width, height], dtype=torch.float32, device=device
        )
        windows = self.calculateWindows(x)

        for channel in range(x.shape[1]):
            for i_convNumber in range(self.out_channels):
                xx = torch.matmul(windows[channel], self.conv[i_convNumber][channel])
                xx = xx.view(-1, width, height)
                result[i_convNumber * xx.shape[0] : (i_convNumber + 1) * xx.shape[0]] += xx

        result = result.view(x.shape[0], self.out_channels, width, height)
        return result

    def calculateWindows(self, x):
        windows = F.unfold(
            x, kernel_size=self.kernel_size, padding=self.padding, dilation=self.dilation, stride=self.stride
        )

        windows = windows.transpose(1, 2).contiguous().view(-1, x.shape[1], self.kernal_size_number)
        windows = windows.transpose(0, 1)

        return windows

    def calculateNewWidth(self, x):
        return (
            (x.shape[2] + 2 * self.padding[0] - self.dilation[0] * (self.kernel_size[0] - 1) - 1)
            // self.stride[0]
        ) + 1

    def calculateNewHeight(self, x):
        return (
            (x.shape[3] + 2 * self.padding[1] - self.dilation[1] * (self.kernel_size[1] - 1) - 1)
            // self.stride[1]
        ) + 1

That code - unfortunately - doesn’t work. I have always about 10% accuracy(so like a random classificator) using CIFAR-10 data-set. Do you know where I made mistake? Please remember I want to work with single sliding vector.

Thanks for your help!

Do you initialize self.conv somewhere, as I cannot find it.
If you use torch.Tensor, the values will be uninitialized, thus they might contain any values including NaN.
Could you try to use torch.randn or a specific initialization for your conv kernels and try your code again?

Thank you, but unfortunately that wasn’t that mistake(or rather not only that one). The model accuracy, when I used your suggestion, is exactly the same - about 10%.

I think that code with changing tensor size(using “view”) might be problematic, but I am too beginner in PyTorch and I can’t find any mistake.

I tried:

self.conv = Parameter(torch.rand(self.out_channels, self.n_channels, self.kernal_size_number))

,

self.conv = Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number))
torch.nn.init.xavier_uniform_(self.conv)

or

self.conv = Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number))

n = self.n_channels
for k in self.kernel_size:
    n *= k
stdv = 1. / math.sqrt(n)
self.conv.data.uniform_(-stdv, stdv)

Hi,

I decided to return to my problem. I checked everything ones again and confirm that my class MyConv2d works well. I compared a output form my layer with output from torch.nn.Conv2d(with fixed weights equal to weights from my layer, without bias) and outputs are equals, but…

When I created a simple network with my Layer(code below) I discovered that the problem is with back-propagation. All weights in my layer are fixed and don’t change during training. How to force my layer to be trainable?

I suppose that loops(code below) are problematic and PyTorch can’t create proper graph to calculate gradient for back-propagation.

def forward(self, x):
        width = self.calculateNewWidth(x)
        height = self.calculateNewHeight(x)
        windows = self.calculateWindows(x)
        
        result = torch.zeros(
            [x.shape[0] * self.out_channels, width, height], dtype=torch.float32, device=device
        )

        for channel in range(x.shape[1]):
            for i_convNumber in range(self.out_channels):
                xx = torch.matmul(windows[channel], weights[i_convNumber][channel]) 
                xx = xx.view(-1, width, height)
                result[i_convNumber * xx.shape[0] : (i_convNumber + 1) * xx.shape[0]] += xx
                
        result = result.view(x.shape[0], self.out_channels, width, height)
        return result  
    def __init__(self):
        super(CnnModel, self).__init__()
        #self.conv1 = nn.Conv2d(3, 64, 3, bias=False)
        #self.conv2 = nn.Conv2d(64, 32, 3, bias=False)
        
        self.conv1 = MyConv2d(3, 64, 3)
        self.conv2 = MyConv2d(64, 32, 3)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(32 * 6 * 6, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.pool(x)

        x = self.conv2(x)
        x = F.relu(x)
        x = self.pool(x)
        
        x = x.view(-1, 32 * 6 * 6)

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        
        return x

Using this forward method and copying it to the class definition from your last post seems to work:

class MyConv2d(nn.Module):
    def __init__(self, n_channels, out_channels, kernel_size, dilation=1, padding=0, stride=1):
        super(MyConv2d, self).__init__()

        self.kernel_size = (kernel_size, kernel_size)
        self.kernal_size_number = kernel_size * kernel_size
        self.out_channels = out_channels
        self.dilation = (dilation, dilation)
        self.padding = (padding, padding)
        self.stride = (stride, stride)
        self.n_channels = n_channels
        self.weights = nn.Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number))

    def forward(self, x):
        width = self.calculateNewWidth(x)
        height = self.calculateNewHeight(x)
        windows = self.calculateWindows(x)
        
        result = torch.zeros(
            [x.shape[0] * self.out_channels, width, height], dtype=torch.float32, device=device
        )

        for channel in range(x.shape[1]):
            for i_convNumber in range(self.out_channels):
                xx = torch.matmul(windows[channel], self.weights[i_convNumber][channel]) 
                xx = xx.view(-1, width, height)
                result[i_convNumber * xx.shape[0] : (i_convNumber + 1) * xx.shape[0]] += xx
                
        result = result.view(x.shape[0], self.out_channels, width, height)
        return result  

    def calculateWindows(self, x):
        windows = F.unfold(
            x, kernel_size=self.kernel_size, padding=self.padding, dilation=self.dilation, stride=self.stride
        )

        windows = windows.transpose(1, 2).contiguous().view(-1, x.shape[1], self.kernal_size_number)
        windows = windows.transpose(0, 1)

        return windows

    def calculateNewWidth(self, x):
        return (
            (x.shape[2] + 2 * self.padding[0] - self.dilation[0] * (self.kernel_size[0] - 1) - 1)
            // self.stride[0]
        ) + 1

    def calculateNewHeight(self, x):
        return (
            (x.shape[3] + 2 * self.padding[1] - self.dilation[1] * (self.kernel_size[1] - 1) - 1)
            // self.stride[1]
        ) + 1

device = 'cpu'
conv = MyConv2d(3, 1, 3)
x = torch.randn(1, 3, 24, 24)
out = conv(x)
out.mean().backward()
print(conv.weights.grad)
> tensor([[[ 0.0884,  0.0803,  0.0611,  0.0813,  0.0692,  0.0481,  0.0903,
           0.0772,  0.0580],
         [-0.0810, -0.0723, -0.0521, -0.0655, -0.0548, -0.0443, -0.0535,
          -0.0374, -0.0262],
         [-0.0439, -0.0307, -0.0260, -0.0412, -0.0239, -0.0154, -0.0400,
          -0.0287, -0.0227]]])

PS: I had to change the nn.Parameter name to self.weightsand callself.weightsintorch.matmulin yourforward` method.

Thank you, but it still doesn’t work. I still think that are is some problem with training process. Maybe you have some other ideas?

  1. You can find my full code below. I use cifar10 data-set to test. Result is:
Files already downloaded and verified
Files already downloaded and verified
-------Your example:
tensor([[[-0.0345, -0.0376, -0.0394, -0.0299, -0.0289, -0.0306, -0.0450,
          -0.0433, -0.0467],
         [-0.0135, -0.0142, -0.0209, -0.0043, -0.0014, -0.0082, -0.0083,
          -0.0053, -0.0138],
         [ 0.0018, -0.0113, -0.0043, -0.0084, -0.0320, -0.0282, -0.0141,
          -0.0362, -0.0300]]], device='cuda:0')
-------Compare my Conv with  nn.Conv2d
tensor(-1.2677e+31, device='cuda:0', grad_fn=<SumBackward0>)
------- Training:
~10% accuracy , example:
22/ 1563 --- Loss: 0000nan | Acc: 09.5109 			(00070/00736)

I see that loss is 0. Of course I run this code longer then 22 batches.

  1. If I change my layer MyConv2d to nn.Conv2d then accuracy increase to normal value ~60%. With my layer accuracy is about ~10%.
#self.conv1 = nn.Conv2d(3, 64, 3, bias=False)
#self.conv2 = nn.Conv2d(64, 32, 3, bias=False)
        
self.conv1 = MyConv2d(3, 64, 3)
self.conv2 = MyConv2d(64, 32, 3)
  1. If I add that print:
def forward(self, x):
    print(self.weights[0][0][0:5].cpu().detach())

then I can see that weights doesn’t change(I tested others weights too) - two tensor because two convolutional layer.

tensor([-4.9320e-21,  4.5769e-41, -4.9320e-21,  4.5769e-41, -3.8139e-18])
tensor([-4.9321e-21,  4.5769e-41,  3.8115e-35,  0.0000e+00, -3.7481e-18])

...after 1 epoch weights are the same...

tensor([-4.9320e-21,  4.5769e-41, -4.9320e-21,  4.5769e-41, -3.8139e-18])
tensor([-4.9321e-21,  4.5769e-41,  3.8115e-35,  0.0000e+00, -3.7481e-18])

from sys import stdout

import torch
import torchvision
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import numpy as np
import math

##############

device = torch.device("cuda:0")
epochs = 10
batch_size = 32

##############

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False)

##############

class MyConv2d(nn.Module):
    def __init__(self, n_channels, out_channels, kernel_size, dilation=1, padding=0, stride=1):
        super(MyConv2d, self).__init__()

        self.kernel_size = (kernel_size, kernel_size)
        self.kernal_size_number = kernel_size * kernel_size
        self.out_channels = out_channels
        self.dilation = (dilation, dilation)
        self.padding = (padding, padding)
        self.stride = (stride, stride)
        self.n_channels = n_channels
        self.weights = nn.Parameter(torch.Tensor(self.out_channels, self.n_channels, self.kernal_size_number))

    def forward(self, x):
        width = self.calculateNewWidth(x)
        height = self.calculateNewHeight(x)
        windows = self.calculateWindows(x)
        
        result = torch.zeros(
            [x.shape[0] * self.out_channels, width, height], dtype=torch.float32, device=device
        )

        for channel in range(x.shape[1]):
            for i_convNumber in range(self.out_channels):
                xx = torch.matmul(windows[channel], self.weights[i_convNumber][channel]) 
                xx = xx.view(-1, width, height)
                result[i_convNumber * xx.shape[0] : (i_convNumber + 1) * xx.shape[0]] += xx
                
        result = result.view(x.shape[0], self.out_channels, width, height)
        return result  

    def calculateWindows(self, x):
        windows = F.unfold(
            x, kernel_size=self.kernel_size, padding=self.padding, dilation=self.dilation, stride=self.stride
        )

        windows = windows.transpose(1, 2).contiguous().view(-1, x.shape[1], self.kernal_size_number)
        windows = windows.transpose(0, 1)

        return windows

    def calculateNewWidth(self, x):
        return (
            (x.shape[2] + 2 * self.padding[0] - self.dilation[0] * (self.kernel_size[0] - 1) - 1)
            // self.stride[0]
        ) + 1

    def calculateNewHeight(self, x):
        return (
            (x.shape[3] + 2 * self.padding[1] - self.dilation[1] * (self.kernel_size[1] - 1) - 1)
            // self.stride[1]
        ) + 1

    def get_weights(self):
        kernal_size = int(math.sqrt(self.kernal_size_number))
        return nn.Parameter(self.weights.view(self.out_channels, self.n_channels, kernal_size, kernal_size))

##############

print("-------")
conv = MyConv2d(3, 1, 3).cuda()
x = torch.randn(1, 3, 24, 24).cuda()
out = conv(x)
out.mean().backward()
print(conv.weights.grad)
print("-------")

##############

class TestModel(nn.Module):
    def __init__(self):
        super(TestModel, self).__init__()
        
        self.conv1 = MyConv2d(3, 64, 3)
        self.conv2 = torch.nn.Conv2d(3, 64, 3, bias=False)
        self.conv2.weight = self.conv1.get_weights()  
        
    def forward(self, x):
        y1 = self.conv1(x)
        y2 = self.conv2(x)
        
        return [y1,y2]
    
model = TestModel().to(device)

x, _ = [ x[0:32] for x in iter(trainloader).next() ]
x = x.to(device)

result = model(x)
print(torch.sum(result[1]-result[0]))
print("-------")
    
##############

class CnnModel(nn.Module):
    def __init__(self):
        super(CnnModel, self).__init__()
        #self.conv1 = nn.Conv2d(3, 64, 3, bias=False)
        #self.conv2 = nn.Conv2d(64, 32, 3, bias=False)
        
        self.conv1 = MyConv2d(3, 64, 3)
        self.conv2 = MyConv2d(64, 32, 3)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(32 * 6 * 6, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.pool(x)

        x = self.conv2(x)
        x = F.relu(x)
        x = self.pool(x)
        
        x = x.view(-1, 32 * 6 * 6)

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        
        return x
    
##############    

model = CnnModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()

for epoch in range(epochs):
    train_loss = 0.0
    correct = 0
    total = 0

    for batch_idx, (inputs, targets) in enumerate(trainloader, 0):
        inputs, targets = inputs.to(device), targets.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        stdout.write("\r{:5d}/{:5d} --- Loss: {:07.5f} | Acc: {:07.4f} \t\t\t({:05d}/{:05d})".format(
                batch_idx,
                len(trainloader),
                train_loss / (batch_idx + 1),
                100.0 * correct / total,
                correct,
                total,
            )
        )
        stdout.flush()
    stdout.write("\n")```

Anyone could give me some advice? Sorry for the post with nothing new to contribute, but I don’t have idea what’s going on…

@kamil4u hey, did you find the problem with your code?

@AG1991
My problem was more complex than I described here. I have been still working on that.

But here the main problem was in the learning rate. If you increase it, the code above should work.

@kamil4u

I want to try forward propagation first with your code. But I only found back propagation. Should I rewrite it?

@111227
I don’t understand. You can find def forward(self, x): in code. That fragment is for forward propagation. Backpropagation is realized by PyTorch - by itself.

@kamil4u
sorry for the misunderstanding, I found def forward in code.
By the way, did you find the problem? I try on many things but the accuracy is still at ~10%

did you set the gradient of your parameter to True?