What does tensor.backward(..) do mathematically?

In your case passing either a tensor of all ones will let you know how sensitive is the sum of the output wrt each input pixel.
If you give a tensor of zeros and a single one for one pixel, it will let you know how sensitive is the predicted depth value for that pixel wrt each input pixel.

Yes there is no math formatting unfortunately :confused:
dloss / dy = target is correct (I think), as loss is a number and y a tensor, dloss / dy should be a tensor of the same size as y, target in this case. The sum(target) is a number and cannot correspond to that gradient:
Assuming y and w being 1D tensors, your formulation dloss / dw = dloss / dy * dy / dw = sum(target) * dy / dw does not make sense as dloss / dw is 1 x w.size() but the last term is of size y.size() x w.size().