Example for One of the differentiated Tensors appears to not have been used in the graph

albanD · July 16, 2021, 9:40pm

The argument is not about them having different values. Just that gradients really correspond to Tensors. And so when you do a[0], you get a branch new Tensor.

Griffin_Tabor · July 17, 2021, 12:11am

What I ended up coming with for this task of taking gradients of loss with respect to some views of my tensor as needed was very hacky.

a_np = np.random.random((3,1))
a = [torch.autograd.Variable(torch.DoubleTensor(element),requires_grad=True) for element in a_np]

#to give to functions that expect full tensor
a_intermediate = torch.stack(a)

output = (2 * a_intermediate).sum()

print(torch.autograd.grad(output, a[0:]))

But I had wanted to do

a_np = np.random.random((3,1))
a = torch.autograd.Variable(torch.DoubleTensor(a),requires_grad=True)


output = (2 * a).sum()

print(torch.autograd.grad(output, a[:]))

In the case where I wanted to compute the gradient of output with respect to a fully contiguous slice of its tensor inputs.

On the view discussion it seems

is really the key disconnect. I was reading this to be a much stronger statement than it is, such that a view contained all the memory like graph connections inside the original tensor. Instead it appears its purely the underlying data/values that are shared.

Ellun · September 8, 2022, 4:08am

Update: I believe this post will be beneficial to others to understand how to calculate Jacobian:

and

https://pytorch.org/functorch/stable/notebooks/jacobians_hessians.html