# Understanding of gradient values after backward pass

Hi,I realize that I have no knowledge of how gradient values we get after calling backward.
for example :

``````x = torch.randn((4,5),requires_grad = True)
z = x.mean()
z.backward()
``````

Assume I have the values of x

``````>>> x
tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
[-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
[ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
[ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>>
``````

And the result of x.grad returns

``````tensor([[1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
[1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
[1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
[1.1500, 1.1500, 1.1500, 1.1500, 1.1500]])

``````

And if I change `z = x.mean()` to `z = x.sum()` , x.grad becomes

``````tensor([[2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
[2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
[2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
[2.1500, 2.1500, 2.1500, 2.1500, 2.1500]])
``````

I would like to know how do we compute and get the values both 1.1500 and 2.1500

Hi Two!

The gradients you report are obviously wrong. I cannot reproduce this
with a pytorch-version-0.3.0 test script:

``````import torch
torch.__version__

torch.manual_seed (2020)

w1 = torch.randn (4,5)
w2 = w1.clone()

x1
z1 = x1.mean()
z1.backward()

x2
z2 = x2.sum()
z2.backward()
``````

Here is the output showing the correct gradients:

``````>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x000001E2AF8B6630>
>>>
>>> w1 = torch.randn (4,5)
>>> w2 = w1.clone()
>>>
>>> x1
Variable containing:
1.2372 -0.9604  1.5415 -0.4079  0.8806
0.0529  0.0751  0.4777 -0.6759 -2.1489
-1.1463 -0.2720  1.0066 -0.0416 -1.2853
-0.4948 -1.2964 -1.2502 -0.7693  1.6856
[torch.FloatTensor of size 4x5]

>>> z1 = x1.mean()
>>> z1.backward()
Variable containing:
1.00000e-02 *
5.0000  5.0000  5.0000  5.0000  5.0000
5.0000  5.0000  5.0000  5.0000  5.0000
5.0000  5.0000  5.0000  5.0000  5.0000
5.0000  5.0000  5.0000  5.0000  5.0000
[torch.FloatTensor of size 4x5]

>>>
>>> x2
Variable containing:
1.2372 -0.9604  1.5415 -0.4079  0.8806
0.0529  0.0751  0.4777 -0.6759 -2.1489
-1.1463 -0.2720  1.0066 -0.0416 -1.2853
-0.4948 -1.2964 -1.2502 -0.7693  1.6856
[torch.FloatTensor of size 4x5]

>>> z2 = x2.sum()
>>> z2.backward()
Variable containing:
1  1  1  1  1
1  1  1  1  1
1  1  1  1  1
1  1  1  1  1
[torch.FloatTensor of size 4x5]
``````

Could you post a complete, runnable script that reproduces your issue
and let us know what version of pytorch you are running?

Best.

K. Frank

Is the question â€śHow are the gradients computed numerically?â€ť ?

Dear Frank
Oh,thatâ€™s so terrible,I would like to re-post this again and sorry about wasting your timeâ€¦

``````>>> import torch
>>> torch.__version__
'1.5.1+cu101'
>>> x = torch.tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
...         [-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
...         [ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
...         [ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>> z = x.mean()
>>> z.backward()
tensor([[0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
[0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
[0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
[0.0500, 0.0500, 0.0500, 0.0500, 0.0500]])

``````

And another one would be :

``````>>> import torch
>>> torch.__version__
'1.5.1+cu101'
>>> x = torch.tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
...         [-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
...         [ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
...         [ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>> z = x.sum()
>>> z.backward()
tensor([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
``````

And once again,Thanks in advance ! !

Yeah,It could be haha,I was struggling how to give this post a great title,and it looks like I failâ€¦

Okay, I can help you a bit there. Gradient is a â€śfancyâ€ť name for derivative (differential calculus). so lets say you have a tensor (2*2) defined as:
W11 W12
W21 W22
When you take torch.mean(), the returned variable say W = (W11 + W21 + W12 + W22)/4.0
And hence the gradient matrix shall look like,
d(W)/d(W11) d(W)/d(W12)
d(W)/d(W21) d(W)/d(W22)

Which is;
.25 .25
.25 .25
I think you can go through the PRML book by chris bishop and gain further insights.
hope that helped.

1 Like

Sure ,why could not I figure this out â€¦
thinks of every element in tensor is a independent variables.and do basic calculus thing
y=x1+x2
dy / dx1 =1

if y=(x1+x2)/2
dy/dx1 = 0.5

And thank you very much!