How to implement multiple different kernel shapes

Hello. I’m currently working on spherical convolutional network topic. Right now I’m trying to develop a new kind of kernel used for the convolutional layer.
The usual kernel is 3x3 matrix. But for spherical images, after being projected onto a plane using equirectangular projection, there will be distortion. So I want to define the kernel as a spherical cap and project it on plane according to its position.
For example, the kernel at different positions of the sphere perspective to the panorama pictures will look like this:

Is there any way to determine the shape of the kernels in these ways? I have already had the whole coordinate of the points in every case. I would very appreciate any help and information.
Thank you guys very much!

Hi, can anyone provide any idea or method of defining custom shape of kernel please give some help!
Thank you!:cry:

I’m not sure I understand the use case completely, but would zeroing out specific spatial positions of a square kernel work (also zeroing out the gradients of these positions)?

Hi. Thank you for replying!
Actually, I’m going to change this method since it seems to have too high convolution cost.
My concern right now is to change the input’s pixel accordingly before performing the convolution. May I continue to ask you about my new concern here or should I open a new topic?

Please continue your explanations here. :slight_smile:

Thank you! You are so kind!
My problem is a little hard to understand so I will try my best to express it. Hope you could understand this.
So the convolution I would like to perform is in conv2d case.
Instead of changing the shape of the kernel so many times, I would like to keep the process of conv2d the same about kernel shape (3x3) and the way natural conv2d should perform in learning,etc.
The thing I want to do can be said in this way:
First, In the normal way of 2d convolution, the multiplication between the kernel and the input should be in this way:

But now, for every small 3x3 part of the input, I will only keep the center pixel. The surrounding pixel of the 3x3 part of the input will be replaced by the corresponding pixels in other positions on the input(which I already knew). The example of the surrounding pixels’ positions is just like on the picture of the first question:

These new pixels and the center pixel will come together and form a new 3x3 matrix that will be used to multiply with the kernel in a normal way. So I technically only interfere the first step of 2D convolution.
Is there anyway of implementing that? Thank you!

Thanks for the detailed description!
I think the best shot at the problem would be to use the functional API and to recreate the patches as you need them.

Since you only want to keep the center pixel, you could probably just use indexing of the input tensor to get these values.
How would you get the outer values? Do you have some kind of mapping or would also indexing / gather work?
As you can see I’m quite unsure about the creation of these input patches, so feel free to give some (code) examples.
Once you have the input patches, you could use F.conv2d and your weight parameter to perform the vanilla convolution.

Thank you for your suggestion!
Right now I have the idea of implementing the creation of matrix used for each multiplication with the kernel like in the code below:

import numpy as np

coor_dict ={
    (1,1): np.matrix([[0, 263, 536, 852, 1920, 2988, 3304, 3577], [2, 2, 1, 0, 0, 0, 1, 2]])
coordinate = np.matrix([1,1])
indices = np.asmatrix(coor_dict[(coordinate[0,0],coordinate[0,1])])

image = np.random.uniform(low=0, high=255, size=(2160,3840))

patch = np.zeros((3,3))
patch = image[0:3,0:3]

print('Old patch: \n', patch)

patch[2,1] = image[indices[1,0],indices[0,0]]
patch[2,2] = image[indices[1,1],indices[0,1]]
patch[1,2] = image[indices[1,2],indices[0,2]]
patch[0,2] = image[indices[1,3],indices[0,3]]
patch[0,1] = image[indices[1,4],indices[0,4]]
patch[0,0] = image[indices[1,5],indices[0,5]]
patch[1,0] = image[indices[1,6],indices[0,6]]
patch[2,0] = image[indices[1,7],indices[0,7]]

print('new patch: \n', patch)

In this case, the input image would be w=3840, h=2160. I have a dictionary containing all the indices of the surrounding pixels on the input image corresponding to each center pixel of the 3x3 patch (In this case: (1,1).
After performing the multiplication between this patch and the kernel, we can continue to form a new patch in the next position. I would like the convolution to have padding = 1 and stride = 1 to have the same size output but increased in number of channels.
I am just a beginner in both python and pytorch so if you can provide some detail instructions, it would mean a huge favor to me!

What I would like to do now is to inspect the part of the source code that perform the task of getting each 3x3 component from the input for each multiplication step in convolution. Can you please show me the file that has this task?
If I want to change this file and apply to my project what should I do?
Thank you!

If you want to experiment with the kernels and be flexible, I would recommend to stay on the Python side for now and use torch.unfold to create the patches as described here. Once you have the patches, you can apply your custom conv operation, reshape it back etc.

If you really want to manipulate the backend implementation, convolutions are dispatched here.

1 Like

Thank you for the suggestion!
I think that using torch.unfold maybe the solution for my case. But I’m finding it hard to understand how this function work and how this can be used to define a custom convolution layer. Can you provide me with some instructions?

The linked post gives you an example of how to create patches using a kernel size and a stride.
You can ignore the padding as well as the “reshape back” section for now.

Each patch will represent basically the sliding window approach of a convolution, so that you can apply any method on it.

1 Like

Now what I’m able to do is to take an image of size(2160x3840), divide it into patches of size (3x3) with stride = 1. Then, I have a set of patches that I can access and change them.
The code that I figured out from the link you provided is as below.

import torch
from torch.nn import functional as F
x = torch.randn(1, 1, 2160, 3840)
kc, kh, kw = 1, 3, 3  # kernel size
dc, dh, dw = 1, 1, 1  # stride

patches = x.unfold(1, kc, dc).unfold(2, kh, dh).unfold(3, kw, dw)
unfold_shape = patches.size()
patches = patches.contiguous().view(-1, kc, kh, kw)

After applying changes on this set of patches. It will still have the same shape. Now is the time to perform convolution.
Since it has stride 1, the patches will intersect each other. Since after the processing, the values of the intersecting patches are not the same. Therefore, the way of combining these patches back into 1 new picture seems impossible. I’m thinking about defining a custom convolution layer for this case.
Is there a way to implement this for CNN? Thank you!

Note that a convolution performs a reduction.
Each patch will be multiplied with the kernel and the sum of the result will be calculated yielding a single scalar value for each patch.
These scalar outputs will create the activation output as is also done internally in the convolution implementations.

Hi @ptrblck,
Right now I’m trying to divide the image into 3x3 patches, process them and then stitch them back together into a new one.
There is a bit of a problem with my processing step: it takes too much time.
Here is my code for the process:

coor = {(0,0) : np.matrix([[0, 263, 536, 852, 1920, 2988, 3304, 3577], [2, 2, 1, 0, 0, 0, 1, 2]])
patches = torch.randn(921600, 1, 3, 3)

def change_patches(patches, coor, i, j):
    coordinate = [i+1,j+1]
    indices = torch.tensor(coor[(coordinate[0],coordinate[1])])
    a = int(coordinate[0]*1280+coordinate[1]/3)
    patches[a,0,2,1] = image[0,0,indices[1,0],indices[0,0]]
    patches[a,0,2,2] = image[0,0,indices[1,1],indices[0,1]]
    patches[a,0,1,2] = image[0,0,indices[1,2],indices[0,2]]
    patches[a,0,0,2] = image[0,0,indices[1,3],indices[0,3]]
    patches[a,0,0,1] = image[0,0,indices[1,4],indices[0,4]]
    patches[a,0,0,0] = image[0,0,indices[1,5],indices[0,5]]
    patches[a,0,1,0] = image[0,0,indices[1,6],indices[0,6]]
    patches[a,0,2,0] = image[0,0,indices[1,7],indices[0,7]]

I’m trying to find a way of implementing this with less time. Since this process is sequential, I want to perform this in parallel. I found out about multiprocessing.pool which has starmap() function that can only work on CPU. I noted that pytorch also has a similar one as torch.multiprocessing.
How do I use this one? Can you provide me with an example on my case?
Thank you very much!!!

Can you help with my problem? Thank you!