# Manually/explicitly calculate gradients of Conv kernels

1. Background:

I can calculate the gradient of `x` with respect to a cost function `loss` in two ways: (1) manually writing out the explicit and analytic formula, and (2) using `torch.autograd` package. Here is my example:

``````import torch
import torch.nn.functional as F

for i in range(10):
x = torch.randn(8, 1, 128, 128)
y = torch.randn(8, 512, 4, 4)
k = torch.randn(512, 1, 32, 32)

loss = lambda z: 0.5 * (F.conv2d(z, k, stride=32) - y).pow(2).sum(dim=[1,2,3])  # cost function is [(1/2)||k*x-y||_F^2]

# 1: calculate gradient of x explicitly and manually
x_grad_manual = F.conv2d(x, k, stride=32) - y

x_var_loss = loss(x_var)

# check if the results of implementations 1 and 2 are equal
``````

Since the mean squared errors of the results of the above two implementations are very small (about 3.4*10^(-8)), I think that their should be mutually matched and the manual implementation works correctly.

2. My Problem

I am confused by how to explicitly write out the gradients of variables (features and Conv kernels) conveniently with some compound processes? For instance, I do not know how to calculate the gradients of feature `x` and Conv kernel `w1` in the following context:

``````import torch
import torch.nn.functional as F

for i in range(10):
x = torch.randn(8, 32, 128, 128)
y = torch.randn(8, 512, 4, 4)
k = torch.randn(512, 1, 32, 32)
w1 = torch.randn(1, 32, 3, 3)

def loss(z, w):
z_forward = F.conv2d(z, w, padding=1)  # z = w1 * x
return 0.5 * (F.conv2d(z_forward, k, stride=32) - y).pow(2).sum(dim=[1,2,3])  # cost function is [(1/2)||k*z-y||_F^2]

# 1: calculate gradients of x and w1 explicitly and manually

x_var_loss = loss(x_var, w1_var)

# check if the results of implementations 1 and 2 are equal
``````

3. Extension:

Furthermore, if the forwarding process is more complicated than the above one, with two middle Conv layers and a `ReLU` activation, how can I write out the gradients? Please see the following problem:

``````import torch
import torch.nn.functional as F

for i in range(10):
x = torch.randn(8, 32, 128, 128)
y = torch.randn(8, 512, 4, 4)
k = torch.randn(512, 1, 32, 32)
w1 = torch.randn(32, 32, 3, 3)
w2 = torch.randn(1, 32, 3, 3)

def loss(z, q1, q2):
z_forward = F.conv2d(z, q1, padding=1)  # z = w1 * x
z_forward = F.relu(z_forward, inplace=True)  # z = ReLU(w1 * x)
z_forward = F.conv2d(z_forward, q2, padding=1)  # z = w2 * ReLU(w1 * x)
return 0.5 * (F.conv2d(z_forward, k, stride=32) - y).pow(2).sum(dim=[1,2,3])  # cost function is [(1/2)||k*z-y||_F^2]

# 1: calculate gradients of x, w1 and w2 explicitly and manually

x_var_loss = loss(x_var, w1_var, w2_var)

# check if the results of implementations 1 and 2 are equal
``````

4. Guarantee of Differentiability

Like my first example, I hope that the manual gradient calculations are totally explicit and themselves are differential, such that I can inject some of the processes in my neural network implementation. Could you please teach me how to achieve this?

5. The Reason of Posting This Problem

In a neural network I constructed, it is needed to calculate the gradients of some features and Conv kernels with respect to my pre-defined cost functions (as you can see above). In my current implementations, I directly employ `torch.autograd` package to calculate various gradients. However, it seems that there are some mistakes accumulated which misleads the learning process when I train such a neural network.

(The whole neural network has its own `loss` function and `backward` process. I just added some extra inner gradient calculations to achieve my goals.)

I conjecture that I should calculate the gradients manually and not directly use `torch.autograd` in a common network forwarding process, since some computational graphs and backwards may be nested and lead to the wrong weight updates.

In my experiments, I train two networks (with manual and auto-calculations, like the first example) and get similar results. But when I extend to more complicated forwardings (like my posted two problems), the training processes would not be stable. So I want to manually write out the gradients to avoid the implementation mistakes and conduct more experiements.

Hi Bin!

To get the gradient of the result of one function applied to the result of
another function, that is, of the composition of two functions, you would
use the chain rule. This is how autograd computes the gradient when
many functions are composed together, such as the successive layers
in a network.

It is true that floating-point round-off error can accumulate during
backpropagation (as it can during the forward pass, as well). Underflow
and overflow "errors’ can occur as well. Nonetheless, autograd does
an altogether solid job of performing these numerical computations.
It is unlikely that you will be able to do better calculating your own