The error message states that you should provide it with the grad output because it is not a scalar right?

The thing is that autograd computes reverse mode AD, which basically performs a matrix product between a given vector v and the Jacobian of the function.
This vector v is what backward expects as input.
For the particular case where your function outputs a scalar value, then v is of size 1 and if it contains the value 1, then what you get is the Jacobian of your function (derivatives).
For the case where there is more outputs (in your case, your y is of size > 1), depending on v, you will get a weighted sum of the rows of the Jacobian. Unfortunately, there is no natural default to use in this case and so we require the user to provide v.