# Backward result in pytorch

I am trying to calculate Jacobian matrix of high-dimension tensor using backward function in Pytorch.
I have my test code below:

``````    t1 = torch.tensor([1.], requires_grad=True)
t2 = torch.tensor([
[0.1, 0.2, 0.3],

tr = t1 + t2
tr.backward(torch.tensor([
[1., 0., 0.],
]))

print("=======================")
``````

And the result is here:

``````tensor([1.])
=======================
tensor([[1., 0., 0.]])
``````

Why the shape of dtr/dt1 is not (3,1)? Why its shape is (1)?

`t1.grad` will have the shape same as `t1`. That’s true for any tensor in general, its `grad`’s shape will be same as its own shape.

And gradient will be sumed to `grad` of leaf node?
For example now `t2`’s shape is (2,3)， and `grad` of `t1` become `[2.]`

``````    t1 = torch.tensor([1.], requires_grad=True)
t2 = torch.tensor([
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6]

tr = t1 + t2
tr.backward(torch.tensor([
[1., 0., 0.],
[1., 0., 0.]
]))

print("=======================")
``````
``````tensor([2.])
=======================
tensor([[1., 0., 0.],
[1., 0., 0.]])
``````

Hi, yes.
Unless you use `zero_grad`, gradients in PyTorch are accumulated by default.

1 Like

Did you mean interface like `optimizer.zero_grad()` or `t.grad().zero_()`? But What if my backward is an one pass operation?
I mean for code below, it’s better that the gradient of `t1` is ([1.]), though `t1` actually contributes to two
independent elements in `t2`. Because in many cases, different index of tensor are independent or parallel for calculation.

``````    t1 = torch.tensor([1.], requires_grad=True)
t2 = torch.tensor([
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6]

tr = t1 + t2
tr.backward(torch.tensor([
[1., 0., 0.],
[1., 0., 0.]
]))

Both `t1.grad.zero_()` and `optimizer.zero_grad()` will zero out the gradient.