Hi! thanks for your answer which is very well packaged!
Probably I am wrong but, as far as I have tested, using the code on my original post I would get a tensor of the size of the input where all the items except the one for which I provide the output are 0
>>>inputs
tensor([[-1.0971, 1.1935, 1.2421],
[-1.0270, 1.1890, 1.2345],
[-0.9428, 1.1836, 1.2254],
...,
[ 1.6314, -2.7406, 0.6462],
[ 1.6619, -2.7843, 0.6395],
[ 1.7008, -2.8399, 0.6311]], device='cuda:0', grad_fn=<ViewBackward0>)
>>>outputs
tensor([[-10.9158],
[ -2.2752],
[ -5.6269],
...,
[ 2.9010],
[ 22.6586],
[ 25.9725]], device='cuda:0', grad_fn=<AddmmBackward0>)
>>>gradients=torch.autograd.grad(inputs=inputs,outputs=outputs[0],allow_unused=True,retain_graph=True)
>>>gradients
(tensor([[ 534.2734,...='cuda:0'),)
>>>gradients[0]
tensor([[ 534.2734, -2844.8074, 709.5488],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000]], device='cuda:0')
However, when doing it using passing the torch.oneslike() as grad_output I get one tensor of the size of the input. I assumed it contained the gradient of each output which respect to their respective input but maybe I am wrong
>>>gradients=torch.autograd.grad(outputs=outputs,inputs=inputs,grad_outputs=torch.ones_like(outputs),retain_graph=True)
>>>gradients
(tensor([[ 534.2734,...='cuda:0'),)
>>>gradients[0]
tensor([[ 534.2734, -2844.8074, 709.5488],
[ -247.5381, 85.9924, 537.6161],
[ 533.3944, 128.2310, 29.9466],
...,
[ -621.1329, -340.0153, 701.6570],
[ 620.2058, 1783.1162, -927.3855],
[ 387.5083, 450.1457, -69.2568]], device='cuda:0')
Here I don’t see all the gradients summed as one scalar output, and their values seem to be the same I would obtain by individually computing them following the code on my original post.
As I said, I am not experienced in this matter and probably I am not understanding it correctly, but could you point if there is some misconception I am assuming?
thanks in advance