Understanding of gradient values after backward pass

Hi,I realize that I have no knowledge of how gradient values we get after calling backward.
for example :

x = torch.randn((4,5),requires_grad = True)
z = x.mean()
z.backward()
#print(x.grad)

Assume I have the values of x

>>> x
tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
        [-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
        [ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
        [ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>> 

And the result of x.grad returns

tensor([[1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
        [1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
        [1.1500, 1.1500, 1.1500, 1.1500, 1.1500],
        [1.1500, 1.1500, 1.1500, 1.1500, 1.1500]])

And if I change z = x.mean() to z = x.sum() , x.grad becomes

tensor([[2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
                [2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
                [2.1500, 2.1500, 2.1500, 2.1500, 2.1500],
                [2.1500, 2.1500, 2.1500, 2.1500, 2.1500]])

I would like to know how do we compute and get the values both 1.1500 and 2.1500

Thanks in advance !

Hi Two!

The gradients you report are obviously wrong. I cannot reproduce this
with a pytorch-version-0.3.0 test script:

import torch
torch.__version__

torch.manual_seed (2020)

w1 = torch.randn (4,5)
w2 = w1.clone()

x1 = torch.autograd.Variable (w1, requires_grad = True)
x1
z1 = x1.mean()
z1.backward()
x1.grad

x2 = torch.autograd.Variable (w2, requires_grad = True)
x2
z2 = x2.sum()
z2.backward()
x2.grad

Here is the output showing the correct gradients:

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> torch.manual_seed (2020)
<torch._C.Generator object at 0x000001E2AF8B6630>
>>>
>>> w1 = torch.randn (4,5)
>>> w2 = w1.clone()
>>>
>>> x1 = torch.autograd.Variable (w1, requires_grad = True)
>>> x1
Variable containing:
 1.2372 -0.9604  1.5415 -0.4079  0.8806
 0.0529  0.0751  0.4777 -0.6759 -2.1489
-1.1463 -0.2720  1.0066 -0.0416 -1.2853
-0.4948 -1.2964 -1.2502 -0.7693  1.6856
[torch.FloatTensor of size 4x5]

>>> z1 = x1.mean()
>>> z1.backward()
>>> x1.grad
Variable containing:
1.00000e-02 *
  5.0000  5.0000  5.0000  5.0000  5.0000
  5.0000  5.0000  5.0000  5.0000  5.0000
  5.0000  5.0000  5.0000  5.0000  5.0000
  5.0000  5.0000  5.0000  5.0000  5.0000
[torch.FloatTensor of size 4x5]

>>>
>>> x2 = torch.autograd.Variable (w2, requires_grad = True)
>>> x2
Variable containing:
 1.2372 -0.9604  1.5415 -0.4079  0.8806
 0.0529  0.0751  0.4777 -0.6759 -2.1489
-1.1463 -0.2720  1.0066 -0.0416 -1.2853
-0.4948 -1.2964 -1.2502 -0.7693  1.6856
[torch.FloatTensor of size 4x5]

>>> z2 = x2.sum()
>>> z2.backward()
>>> x2.grad
Variable containing:
 1  1  1  1  1
 1  1  1  1  1
 1  1  1  1  1
 1  1  1  1  1
[torch.FloatTensor of size 4x5]

Could you post a complete, runnable script that reproduces your issue
and let us know what version of pytorch you are running?

Best.

K. Frank

Is the question “How are the gradients computed numerically?” ?

Dear Frank :slight_smile:
Oh,that’s so terrible,I would like to re-post this again and sorry about wasting your time…

>>> import torch
>>> torch.__version__
'1.5.1+cu101'
>>> x = torch.tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
...         [-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
...         [ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
...         [ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>> z = x.mean()
>>> z.backward()
>>> x.grad
tensor([[0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
       [0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
       [0.0500, 0.0500, 0.0500, 0.0500, 0.0500],
       [0.0500, 0.0500, 0.0500, 0.0500, 0.0500]])

And another one would be :

>>> import torch 
>>> torch.__version__
'1.5.1+cu101'
>>> x = torch.tensor([[-0.3571,  0.1481,  0.1713, -1.2597, -0.7667],
...         [-0.1553, -0.9620,  0.0103,  3.3494,  0.2220],
...         [ 2.1131, -0.2404,  0.4820,  0.3816,  1.9752],
...         [ 1.7232, -0.5064, -0.8151,  0.3720,  0.1470]], requires_grad=True)
>>> z = x.sum()
>>> z.backward()
>>> x.grad
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

And once again,Thanks in advance ! !

Yeah,It could be haha,I was struggling how to give this post a great title,and it looks like I fail…

Okay, I can help you a bit there. Gradient is a “fancy” name for derivative (differential calculus). so lets say you have a tensor (2*2) defined as:
W11 W12
W21 W22
When you take torch.mean(), the returned variable say W = (W11 + W21 + W12 + W22)/4.0
And hence the gradient matrix shall look like,
d(W)/d(W11) d(W)/d(W12)
d(W)/d(W21) d(W)/d(W22)

Which is;
.25 .25
.25 .25
I think you can go through the PRML book by chris bishop and gain further insights.
hope that helped.

1 Like

Sure ,why could not I figure this out …
thinks of every element in tensor is a independent variables.and do basic calculus thing
y=x1+x2
dy / dx1 =1

if y=(x1+x2)/2
dy/dx1 = 0.5

And thank you very much!