Torch.where and autograd function

I written 2 code for torch.where() as below. In first code it print “require_grad” True but in Second code when i write same in “torch.autograd.Function” it shows “require_grad” False. Why the behavior is different?

First Code:
import torch
a=torch.zeros([1,5,5,10],device=0,requires_grad=True)
b=torch.tensor([3],device=0, dtype=torch.float, requires_grad=True)
c=torch.tensor([4],device=0, dtype=torch.float, requires_grad=True)
d=torch.where(a>200,b,c)
print(" d grad is ",d.requires_grad)

Second Code:
class fun_c(torch.autograd.Function):
@staticmethod
def forward(cxt,input):
a=torch.zeros([1,5,5,10],device=0,requires_grad=True)
a=torch.tensor([3],device=0, dtype=torch.float, requires_grad=True)
b=torch.tensor([4],device=0, dtype=torch.float, requires_grad=True)
d=torch.where(a>200,b,c)
print(" d grad is ",d.requires_grad)
return d
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
hist=fun_c.apply
a=hist(2)

Hi,

During the forward pass, the Tensor created there have requires_grad=False because we don’t track gradients for them.
The reason we don’t track gradients is because you provide a backward function that tells us how to compute them for that forward. So there is no need for the autograd to actually track that forward.

Ok i understood. So let say i want to extract some information from customize function and use that information for loss calculation. Can i pass the “grad_input=torch.ones(1,1,1,10) * grad_output” instead of “grad_input[input < 0] = 0” in backward function shown in Second code for input of shape torch.ones(1,1,1,10) in forward function so that loss will backpropagate and network will converge?.

One more thing i want to add is in forward pass i extract the information using torch commands like “torch.where” and “torch.count_nonzero” and no differentiable operation. So could i pass the gradient information of next layer you can say which is loss to previous layer?

Yes, the backward should return a Tensor of the same size as the input containing the backpropagated gradients.