# How to use torch.autograd.backward when variables are non-scalar

hi, I am a newer of Pytorch. In recently, I encountered a problem about the function “torch.autograd.backward”. I have read the doc about it, while I cannot know what it means for one parameter. The doc write “The graph is differentiated using the chain rule. If any of variables are non-scalar (i.e. their data has more than one element) and require gradient, the function additionaly requires specifying grad_variables.” Could someone give me a simple example about the variable are non-scalar and how to give the grad_variables?

Thank you very much for your attention!

2 Likes
``````In [12]: x = Variable(torch.randn(10), requires_grad=True)

In [13]: y = x ** 2

``````

Dear smth, Thank you very much!

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

To understand the function deeply, I construct two examples to show what I want to do.

Firstly, I want to compute the derivation of y for all(any) elements in x. But I could not get it except I change `y = x**2` to `y = sum(x**2)`. Do I have another way to get it?

``````import torch
import numpy as np

y = x**2
# torch.autograd.backward(x, y)  # ?? it is wrong
y.backward()
``````

Furthermore, if I want to compute the derivation for `x = 4`? How can I write the code next?

Secondly, I want to compute the derivation of y and z for all(any) elements in x. Can I write the code into a group by `torch.autograd.backward([y], [grad])` ?

``````import torch
import numpy as np

y = sum(x**2)
z = sum(x*3)

y.backward()

# how to set x.grad = 0??
z.backward()
``````
1 Like

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

No. This will compute the gradient of all leaf nodes in the graph that `y` was created by. Leaf nodes are user-created Variables. In this case the only leaf node is `x`.

``````y = x ** 2
``````

Since `y` is not a scalar here, we are giving the gradient wrt each output component y[0], y[1], y[2] etc. in the `grad` part of `torch.autograd.backward`. Hope that made it clearer.

3 Likes

Dear smth, thank you very much for your help! And I understood your idea! I also find another related problem and wish it is helpful for others. Thanks!

1 Like

How do you initialize the [grad] vector? can we randomly initialize it?

You might initialize it randomly or with any other gradient value you would like to pass.
However, the default value would be `torch.ones`.

Thanks ptrblck