Calculating derivative of loss w.r.t. single parameter

soulitzer · August 21, 2021, 2:50am

Any indexing you do on x creates a new view tensor of the base tensor x. In terms of gradients, the flow is one directional, i.e., if the view is used in the loss, the gradients will flow back to the base, but if you modify the base only (which isn’t actually aware of any of the views), the gradient wrt to the view is not updated. What you could do instead is index after computing the gradient (x.grad[0]), or compute the loss using the view instead of the base.