Backward result in pytorch

Mirocos · January 13, 2023, 2:19pm

I am trying to calculate Jacobian matrix of high-dimension tensor using backward function in Pytorch.
I have my test code below:

    t1 = torch.tensor([1.], requires_grad=True)
    t2 = torch.tensor([
        [0.1, 0.2, 0.3],
        ], requires_grad=True)

    tr = t1 + t2
    tr.backward(torch.tensor([
        [1., 0., 0.],
    ]))

    print(t1.grad.data)
    print("=======================")
    print(t2.grad.data)

And the result is here:

tensor([1.])
=======================
tensor([[1., 0., 0.]])

Why the shape of dtr/dt1 is not (3,1)? Why its shape is (1)?

rinkujadhav2013 · January 13, 2023, 5:16pm

t1.grad will have the shape same as t1. That’s true for any tensor in general, its grad’s shape will be same as its own shape.

Mirocos · January 14, 2023, 7:31am

And gradient will be sumed to grad of leaf node?
For example now t2’s shape is (2,3)， and grad of t1 become [2.]

    t1 = torch.tensor([1.], requires_grad=True)
    t2 = torch.tensor([
        [0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6]
        ], requires_grad=True)

    tr = t1 + t2
    tr.backward(torch.tensor([
        [1., 0., 0.],
        [1., 0., 0.]
    ]))

    print(t1.grad.data)
    print("=======================")
    print(t2.grad.data)

tensor([2.])
=======================
tensor([[1., 0., 0.],
        [1., 0., 0.]])

srishti-git1110 · January 14, 2023, 1:09pm

Hi, yes.
Unless you use zero_grad, gradients in PyTorch are accumulated by default.

Mirocos · January 16, 2023, 1:43pm

Did you mean interface like optimizer.zero_grad() or t.grad().zero_()? But What if my backward is an one pass operation?
I mean for code below, it’s better that the gradient of t1 is ([1.]), though t1 actually contributes to two
independent elements in t2. Because in many cases, different index of tensor are independent or parallel for calculation.

    t1 = torch.tensor([1.], requires_grad=True)
    t2 = torch.tensor([
        [0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6]
        ], requires_grad=True)

    tr = t1 + t2
    tr.backward(torch.tensor([
        [1., 0., 0.],
        [1., 0., 0.]
    ]))

    print(t1.grad.data)
    print("=======================")
    print(t2.grad.data)

rinkujadhav2013 · January 19, 2023, 6:10pm

Both t1.grad.zero_() and optimizer.zero_grad() will zero out the gradient.
If you don’t call any of those then t1.grad would accumulate over backward passes. If you have just 1 backward pass then t1.grad would be just the gradient for that 1 backward pass. If you call backward again (without zeroing out the grad) then t1.grad would be the old value plus the new value and so on.