What does tensor.backward(..) do mathematically?

In your case passing either a tensor of all ones will let you know how sensitive is the sum of the output wrt each input pixel.
If you give a tensor of zeros and a single one for one pixel, it will let you know how sensitive is the predicted depth value for that pixel wrt each input pixel.

dloss / dy = target is correct (I think), as loss is a number and y a tensor, dloss / dy should be a tensor of the same size as y, target in this case. The sum(target) is a number and cannot correspond to that gradient:
Assuming y and w being 1D tensors, your formulation dloss / dw = dloss / dy * dy / dw = sum(target) * dy / dw does not make sense as dloss / dw is 1 x w.size() but the last term is of size y.size() x w.size().