When I want to get the input grad in mse_loss, what's the difference between mse_loss, backward and torch.ops.aten.mse_loss_backward?

I wanna get input grad in mse_loss with reduction=‘mean’. Regularly, the process is

import torch
x = torch.tensor(torch.tensor([1.,2.,3.,4.,5.]))
target = torch.tensor(torch.tensor(5.,4.,3.,2.,1.))
out = torch.nn.functional.mse_loss(x, target, reduction = 'mean')
the outpu is tensor([-3.2, -1.6, 0.0, 1.6, 3.2])
besides, if I exec like this: out.backward(torch.tensor[2,3,4,5,6])), I will get a RuntimeError.

if I use mse_loss_backward to get x.grad, the process is

import torch
grad_out = torch.tensor(torch.tensor([2,3,4,5,6]))
x = torch.tensor(torch.tensor([1,2,3,4,5]))
target = torch.tensor(torch.tensor(5,4,3,2,1))
out = torch.ops.aten.mse_loss_backward(grad_output, x, target, 1)
the out is tensor([-3.2, -2.4, 0.0, 4.0, 9.6])

I want to konw what’s the difference between the two methods, and Why can’t they have the same input?

Your code has a lot of issues and not only raises warnings in the unnecessary re-wrapping of tensors via torch.tensor(torch.tensor()), but also fails:

# TypeError: 'builtin_function_or_method' object is not subscriptable
NameError: name 'grad_output' is not defined
    target = torch.tensor(torch.tensor(5,4,3,2,1))

TypeError: tensor() takes 1 positional argument but 5 were given

After fixing these issues, the code works as expected. Your first example fails since out is reduced and thus a single scalar. The gradient has to match its shape as indicated by the error message.