Hi @sanchit2843,
I believe what Thomas meant in one of the reply, is:
You should separate forward and backward phases.
Inside forward there is no gradient calculation. Forward is one side of the PyTorch medal and backward is another. Backward phase is where the gradients are calculated.
And usually this looks like a backward on loss. Say loss.backward()
.
You can always check what is going on, if you ask for a gradient on a tensor.
If you have a tensor t
, you can ask for a t.grad
. Tensor and it’s gradient should have the same dimension, in case gradient is not None
.
Here is an example for you:
You create a tensor x
and you ask for three more things in general:
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad)
print(x.grad_fn)
print(x.is_leaf)
And the output of this will be:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
None
None
True
Now, you create y
:
y = x + 2
print(y)
print(y.grad)
print(y.grad_fn)
print(y.is_leaf)
Out:
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
None
<AddBackward0 object at 0x0000028C403BC198>
False
So what you get is there is no gradient in y
. Can you tell me why?
But let’s go further.
loss = torch.mean(y)
loss.backward()
print(y)
print(y.grad)
print(y.grad_fn)
print(y.is_leaf)
Out:
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
None
<AddBackward0 object at 0x0000028C411582E8>
False
Still, no gradient in y
.
But look there is no gradient in loss
also:
print(loss)
print(loss.grad)
print(loss.grad_fn)
print(loss.is_leaf)
tensor(3., grad_fn=<MeanBackward0>)
None
<MeanBackward0 object at 0x0000028C403BC6D8>
False
So where is the gradient?
print(x)
print(x.grad)
print(x.grad_fn)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
tensor([[0.2500, 0.2500],
[0.2500, 0.2500]])
None
It is calculated for x
.
Now you will probable go to the definition of backward()
and find out it calculates the gradients for the inputs.
Computes the sum of gradients of given tensors w.r.t. graph leaves.
So, by definition, backward
computes the gradients for the leaves.
One example, where you can set manual gradients:
x = torch.ones(2, 2, requires_grad=True)
y = x * 2
y.backward(gradient=torch.tensor([[0.5,1.],[1.5,2.]]))
print(x)
print(x.grad)
print(x.grad_fn)
print(x.is_leaf)
Out:
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
tensor([[1., 2.],
[3., 4.]])
None
True
This speaks clearly that backward computation is little more complicated that you may expect.
You can set the manual gradients and this cannot be done unless a graph behind.
Here is what I mean unless a graph behind. If you task yourself this:
a=1
b=1
c=1
d=1
def f(a,b,c,d):
return 2*(a + b + c + d)
e =0.01 # small number
'''gradients da'''
da = (f(a+e,b,c,d)-f(a,b,c,d))/e
print(da)
This will output:
1.9999999999999574
But to calculate the gradient for a
we haven’t created anything like a graph. We just used the calculus.
So, I probable there is no sense to use torch.no_grad
inside forward
.