# Backprop over kernels in a CNN

Let’s say a convolutional layer takes an input 𝑋 with dimensions of 5x100x100 and applies 10 filters 𝐹 5x5x5, thus produces an output 𝑂 10 feature maps 96x96.
During the backpropagation the layer receives 𝑑𝐸/𝑑𝑂 of shape 10x96x96.
My question is how to compute 𝑑𝐸/𝑑𝐹 ?

According to [that article]
(https://medium.com/@20!7csm1006/forward-and-backpropagation-in-convolutional-neural-network-4dfa96d7b37e)
𝑑𝐸/𝑑𝐹 can be calculated as convolution between 𝑋 and 𝑑𝐸/𝑑𝑂
Unfortunately, the article does not cover a case with multiple filters and multiple input channels.
Since 𝑋 has shape 5x100x100 and 𝑑𝐸/𝑑𝑂 has shape 10X96x96 the depth of 𝑋 equals to 5 and the depth of 𝑑𝐸/𝑑𝑂 equals to 10. So the depth dimension does not match. How to compute convolution in that case ?

The author posted a solution to this problem as shown in the image.But this shows that the gradient of all filters will be the same across their channels which I could not reproduce with my code?

Is the method wrong or is something wrong with my code?

``````import torch
import numpy as np
import matplotlib.pyplot as plt
import torch.nn as nn
import cv2
import matplotlib.pyplot as plt
ref_tensor1=ref_tensor1.unsqueeze(0).unsqueeze(0)
print(ref_tensor1.shape)
image1=cv2.resize(image1,dsize=(256,256))
image1=np.rollaxis(image1,2)
image2=cv2.resize(image2,dsize=(256,256))
image2=np.rollaxis(image2,2)
img_tensor2=torch.from_numpy(image2).unsqueeze(0)
img_tensor1=torch.from_numpy(image1).unsqueeze(0)
img_tensors=torch.cat((img_tensor1,img_tensor2),0)
print(img_tensors.shape)
print("Input_image_shape:",img_tensors.shape)
#print(img_tensors)

class torch_model1(torch.nn.Module):
def __init__(self,ic,oc,ks):
super(torch_model1,self).__init__()
self.conv1 = torch.nn.Conv2d(in_channels=ic,out_channels=oc,kernel_size=ks,stride=1)
def forward(self,x):
x = self.conv1(x)
return (x)

###1,3###
model1=torch_model1(3,3,32)
temp=torch.randn(img_tensors.shape)
op1=model1(img_tensor1)
print(op1.shape)
#assert(op1.shape==ref_tensor1.shape)
loss=torch.abs(op1-ref_tensor1).mean()
print(loss)
loss.backward()
##########RESULT(1,3,x,y):changes if seed is not set and the gradient is same for all channels#############
plt.subplot(131)
plt.subplot(132)
plt.subplot(133)
plt.show()
``````
1 Like

I dont know if it helps, but I wrote a post on it here: http://soumith.ch/ex/pages/2014/08/07/why-rotate-weights-convolution-gradient/

It goes into the calculations in a bit more detail.

This post shows the calculation for gradient of the input.I need the solution for calculation of the? gradient of the filter weights?

Hi @Srinjay_Sarkar, did you figure this out?
Adding padding, dilation, stride, and channels makes this a very mind twisting exercise!

@Sia_Rezaei, you might want to look at this .It has all the python implementations for calculating the gradients of weights,biases and inputs.But using this for a network makes it extremely slow.I used a C++ extension to call the cudnn backprop which is much faster.

@Srinjay_Sarkar thanks! Yes, I just found out about `grad.py` and yea, it is slow!
Can you share how you call cudnn backprop? Or point me to the right direction? Thanks!

This is a good starting point.