The backward of .sum() is “.expand()”.
The first variant (btw. I tend to write it u_t, = torch.autograd.grad(...), i.e. use tuple unpacking instead of indexing) is implicitly furnishing a scalar 1 as the grad_outputs.
Thus the difference is that 1. has a first step does an expand of the 1 as the backward of .sum() while 2. doesn’t have this step. After this, the two backward passes are exactly identical.
The third variant looks like an error to me unless u is scalar already because there will be a shape mismatch between the first argument and the grad_outputs.
Best regards
Thomas
P.S.: This is a great question, I’ll re-use it as an exercise for my autograd course if you allow.