Example for One of the differentiated Tensors appears to not have been used in the graph

The argument is not about them having different values. Just that gradients really correspond to Tensors. And so when you do a[0], you get a branch new Tensor.

What I ended up coming with for this task of taking gradients of loss with respect to some views of my tensor as needed was very hacky.

a_np = np.random.random((3,1))
a = [torch.autograd.Variable(torch.DoubleTensor(element),requires_grad=True) for element in a_np]

#to give to functions that expect full tensor
a_intermediate = torch.stack(a)

output = (2 * a_intermediate).sum()

print(torch.autograd.grad(output, a[0:]))

But I had wanted to do

a_np = np.random.random((3,1))
a = torch.autograd.Variable(torch.DoubleTensor(a),requires_grad=True)


output = (2 * a).sum()

print(torch.autograd.grad(output, a[:]))

In the case where I wanted to compute the gradient of output with respect to a fully contiguous slice of its tensor inputs.

On the view discussion it seems

is really the key disconnect. I was reading this to be a much stronger statement than it is, such that a view contained all the memory like graph connections inside the original tensor. Instead it appears its purely the underlying data/values that are shared.

Update: I believe this post will be beneficial to others to understand how to calculate Jacobian:

and

https://pytorch.org/functorch/stable/notebooks/jacobians_hessians.html