# Gradient of loss with respect to parameters

Hey everyone,

for my high school project I want to give a talk about neural networks, and connecting it to our calculus class. I now want to implement a small example in python and found pytorch for it.

Now I want to compute the gradient of the loss function wrt to the parameters of my model f(w).

The gradient of the loss function f(x,y) wrt to parameters w, where x is the input and y is the target, now has only the dimension of x, and not y, can anybody help me with that and maybe show me a way to fix it?

Hi,

I am a bit confused by “f(x,y) wrt to parameters w” Does that mean that `f` also takes “w” as input but it is not explicit (like a model created with an nn.Module in pytorch)?

Also I don’t get this “now has only the dimension of x, and not y”, could you clarify please?

Hey,

so let’s assume I just want linear regression,

input: x, output: y
y = f(x,W) = x*W

loss(f(x,w), y)

now dloss(x,y) / dw has dimension |x| and not |x|+1 what I thought it should be

maybe I’m just confusing something

Ok.
So you don’t actually want the gradient for `x`, only `W` right?

I think the `+1` difference in the size comes from the bias term that is often added in linear regression that you are missing when you do x*W only.
You can either have another parameter of size 1 `b` and compute `x*W + b`. And now you get gradient for both W and b (this is the recommended way to do with NN-based libraries like pytorch).

Another approach that is used in more classical ML, is to add one more element to `x` in your function and have W be of size |x|+1:
`extended_x = torch.cat([x, torch.tensor([1.])], dim=0)` and then `y = extended_x * W`.

Hope this helps !

Hey, one more question:

so I’m now using torch.autograd.grad(output=criterion(y, x), model.parameters()) and somehow the dimension is still X-1 (so w/o bias in your example, I just assumed x0=1 and have the bias included inside the W, sorry)… any tip?

That would depend how and when you add this new feature.