Backward for custom loss function gradient convention

I am trying to implement a new scalar loss function with two vector parameters, X and Y.
I am deriving the closed form equation of the gradient with respect to the input but I need to know whether I should do it in numerator convention (y xT) or denominator (yT x), as I don’t really know which one pytorch uses for its autograd package.

Also on a side note; throughout my derivations I stumble upon the derivative of a matrix with respect to a vector, which is a 3rd order tensor, how do you suggest that I go about that in pytorch; does it support generalized jacobians or should i flatten my matrix into a vector then compute the jacobian?