How to use torch.autograd.backward when variables are non-scalar

matliu · June 21, 2017, 11:40am

hi, I am a newer of Pytorch. In recently, I encountered a problem about the function “torch.autograd.backward”. I have read the doc about it, while I cannot know what it means for one parameter. The doc write “The graph is differentiated using the chain rule. If any of variables are non-scalar (i.e. their data has more than one element) and require gradient, the function additionaly requires specifying grad_variables.” Could someone give me a simple example about the variable are non-scalar and how to give the grad_variables?

Thank you very much for your attention!

smth · June 22, 2017, 2:45pm

In [12]: x = Variable(torch.randn(10), requires_grad=True)

In [13]: y = x ** 2

In [14]: grad = torch.randn(10)

In [15]: torch.autograd.backward([y], [grad])

matliu · June 23, 2017, 1:34am

Dear smth, Thank you very much!

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

To understand the function deeply, I construct two examples to show what I want to do.

Firstly, I want to compute the derivation of y for all(any) elements in x. But I could not get it except I change y = x**2 to y = sum(x**2). Do I have another way to get it?

import torch
from torch.autograd import Variable
import numpy as np

x = Variable(torch.arange(1,4), requires_grad=True)
y = x**2
 # torch.autograd.backward(x, y)  # ?? it is wrong
y.backward()
print(x.grad)

Furthermore, if I want to compute the derivation for x = 4? How can I write the code next?

Secondly, I want to compute the derivation of y and z for all(any) elements in x. Can I write the code into a group by torch.autograd.backward([y], [grad]) ?

import torch
from torch.autograd import Variable
import numpy as np

x = Variable(torch.arange(1,4), requires_grad=True)
y = sum(x**2)
z = sum(x*3)

y.backward()
print(x.grad)

 # how to set x.grad = 0??
z.backward()
print(x.grad)

smth · June 27, 2017, 9:11pm

From what I understand, I think torch.autograd.backward([y], [grad]) is that we apply function grad to y. I do not know if it is right?

No. This will compute the gradient of all leaf nodes in the graph that y was created by. Leaf nodes are user-created Variables. In this case the only leaf node is x.

y = x ** 2

Since y is not a scalar here, we are giving the gradient wrt each output component y[0], y[1], y[2] etc. in the grad part of torch.autograd.backward. Hope that made it clearer.

matliu · July 3, 2017, 12:05am

Dear smth, thank you very much for your help! And I understood your idea! I also find another related problem and wish it is helpful for others. Thanks!

singhvishal0209 · March 10, 2020, 5:47pm

How do you initialize the [grad] vector? can we randomly initialize it?

ptrblck · March 11, 2020, 2:41am

You might initialize it randomly or with any other gradient value you would like to pass.
However, the default value would be torch.ones.

singhvishal0209 · March 11, 2020, 6:06am

Thanks ptrblck