Custom Convolution Dot Product

@hughperkins Is it possible the same thing with Pytorch 0.3 ? (i cannot upgrade the version)
If yes can u show me an example. Thank you.

I think unfold is new in 0.4.

The actual unfold function itself is present in torch for years. So, you could use FFI to call it. It’s not that hard to use FFI from pytorch AFAIK, but it’s not something I can describe in a few lines, partly because I’d have to go away and google, search for pytorch forums posts on it. I’ve done FFI from python before though, and it’s fairly painless, if you have a few days to kill.

On the whole, if upgrading to 0.4 will take less than ~few hours effort, I’d go for the 0.4 solution probably.

(note that ‘unfold’ used to be called ‘im2col’ as Simon alludes to)

1 Like

Your example and link help me a lot!! Thanks! I’m quite new to this field, and still have one question about the bias part. From what I saw, bias term is a vector (i.e. 1D tensor) with the size of output channels. How to add this bias term into the unfold version of the convolution?

Create a 1d tensor, with requires_grad=True. Use broadcasting to add it to the result of the matrix multiplication.

I tried this way before, but I cannot get the broadcasting work.

Say my multiplication result has the size [32, 6, 28, 28], and the bias term is a 1D tensor with the size 6. Broadcasting these two together will result in an error saying
"The expanded size of the tensor (28) must match the existing size (6) at non-singleton dimension 3".
Is there anything wrong with my implementation?

you need to unsqueeze the last dimension of your tensor twice. concretely, your bias tensor needs to have the dimensions [6, 1, 1]

Got it! Thanks for your help!

1 Like

I am interested in the other way around: unfolding the kernel weight matrix (instead of X) and then multiplying it with a flattened X to get the same result. But, I have quite some difficulties getting the dimensions right.

Would this be possible?

(I understand that the matrices will become a lot larger this way)

@Tycho_van_der_Oudera Unclear to me what you are asking. The GEMM way of doing convolution involves ‘flattening out’ both the spatial input tensor and the spatial kernels. However, for the kernels, there’s no need to do any ‘unrolling’ as such, a pytorch .view is sufficient. (You can check my code above for an example).

Why do you have unfold the input, but can just .view the kernel?

if you want to use matrix multiplication to calculate convolution, the input matrix needs to be ‘duplicated’. But the kernel not.

unfold do the duplication

Thanks for your code,

what is the @ part res = kernels_flat @ Xunfold

Can i replace the multiplication and addition with my own mymult(num1,num2) and my add(num1,num2) with this operation ?

(Update: Thankyou very much i did solved my issue, i replaced it with nested loops to get what i wanted)

hey, @ lucamocerino did you find a solution to your problem? I am too trying to modify the dot multiplication in the convolution layer. Thanks

Avoid nested loops! Try to have as much of a functional code as you can. Loops don’t parallelize well on the GPU.

I also don’t know for sure what that @ operator is.

From the documentation, it seems to be
https://pytorch.org/docs/stable/torch.html#torch.bmm

On his example, the shapes are not both 3 dimensions, and they just match the size in one of the dimensions, 16. So, I am assuming that multiplication can possible be broadcasted, thus https://pytorch.org/docs/stable/torch.html#torch.matmul. However I could get an actual confirmation to which operation is that @ mapping to.

1 Like

@ is matrix multiplication in python3. Yes you are right about the nested loops part. I am trying to do all of it with functional codes.

I was just able to make it work for me with the code bellow.
PS.: When pasting here I decided to put the hard code values on the top, I don’t think I messed up anything.

Let me know if it doesn’t run still. I am put a Gist here for you.

batch_size = 32
size_img = 32 
size_k = 5
padd_k = 2
c_in, c_out = 64, 64

batch_sample = np.random.normal(size=(batch_size, c_in, size_img, size_img))
kernel = np.random.normal(size=(c_out, c_in, size_k, size_k))

device = torch.device('cuda:0') \
  if torch.cuda.is_available() else torch.device('cpu')
print(device)

_batch_sample = torch.from_numpy(batch_sample).float().cuda()
_kernel = torch.from_numpy(kernel).float().cuda()

def conv2d_spatial(x, filters):
  return torch.nn.functional.conv2d(x, filters, padding=padd_k

def conv2d_spatial2(x, filters):
  unfold = torch.nn.functional.unfold(x, size_k, padding=padd_k) 
  kernels_flat = filters.view(c_out, -1)
  out = kernels_flat @ unfold
  return out.view(-1, c_out, size_img, size_img)
1 Like

Thanks alot for your code, i really appreciate it. Actually my problem was slightly different, i had to simulate behavior of certain adders and multiplier inside “@” part [1]. I did solved it exactly like your code using numpy for my multipliers and adders.

@eduardo4jesus I believe the size of the output of the convolution is not the image size, I think it’s (height_image - height_filter + 2 * padding) + 1

1 Like

Thanks for pointing that out, at some point I fixed that up. I was just revisiting the topic looking for whether or not is the same implementation and mathematical equations for convolution with stride > 1.
Though I have used the code below, I have an assert somewhere else to guarantee stride == 1.

def _conv2d(cls, input, weight, stride=1, padding=0, dilation=1):
    # print('Custom __conv2d')
    # From https://cs231n.github.io/convolutional-networks/
    # O = (W - F + 2P)/S + 1, constrained so that O is integer.
    # Since we only support stride == 1
    # O = (W - F + 2P) + 1
    out_size = input.size(-2) - weight.size(-2) + 2*padding[0] + 1

    input_unfolded = F.unfold(input, weight.size()[2:], dilation, padding, stride)
    weight_flat = weight.reshape(weight.size(0), -1)
    conv = weight_flat@input_unfolded; del weight_flat; del input_unfolded;
    output = conv.reshape(input.size(0), weight.size(0), out_size, out_size)
    return output

Would the back-propagation for convolution with stride be the same as regular convolution? How the derivative for stride > 1 would be?