hi, I am a newer of Pytorch. In recently, I encountered a problem about the function “torch.autograd.backward”. I have read the doc about it, while I cannot know what it means for one parameter. The doc write “The graph is differentiated using the chain rule. If any of variables are non-scalar (i.e. their data has more than one element) and require gradient, the function additionaly requires specifying grad_variables.” Could someone give me a simple example about the variable are non-scalar and how to give the grad_variables?

In [12]: x = Variable(torch.randn(10), requires_grad=True)
In [13]: y = x ** 2
In [14]: grad = torch.randn(10)
In [15]: torch.autograd.backward([y], [grad])

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

To understand the function deeply, I construct two examples to show what I want to do.

Firstly, I want to compute the derivation of y for all(any) elements in x. But I could not get it except I change y = x**2 to y = sum(x**2). Do I have another way to get it?

import torch
from torch.autograd import Variable
import numpy as np
x = Variable(torch.arange(1,4), requires_grad=True)
y = x**2
# torch.autograd.backward(x, y) # ?? it is wrong
y.backward()
print(x.grad)

Furthermore, if I want to compute the derivation for x = 4? How can I write the code next?

Secondly, I want to compute the derivation of y and z for all(any) elements in x. Can I write the code into a group by torch.autograd.backward([y], [grad]) ?

import torch
from torch.autograd import Variable
import numpy as np
x = Variable(torch.arange(1,4), requires_grad=True)
y = sum(x**2)
z = sum(x*3)
y.backward()
print(x.grad)
# how to set x.grad = 0??
z.backward()
print(x.grad)

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

No. This will compute the gradient of all leaf nodes in the graph that y was created by. Leaf nodes are user-created Variables. In this case the only leaf node is x.

y = x ** 2

Since y is not a scalar here, we are giving the gradient wrt each output component y[0], y[1], y[2] etc. in the grad part of torch.autograd.backward. Hope that made it clearer.

Dear smth, thank you very much for your help! And I understood your idea! I also find another related problem and wish it is helpful for others. Thanks!