Gradient of loss with respect to parameters

Hey everyone,

for my high school project I want to give a talk about neural networks, and connecting it to our calculus class. I now want to implement a small example in python and found pytorch for it.

Now I want to compute the gradient of the loss function wrt to the parameters of my model f(w).

The gradient of the loss function f(x,y) wrt to parameters w, where x is the input and y is the target, now has only the dimension of x, and not y, can anybody help me with that and maybe show me a way to fix it?
Thanks in advance!

Hi,

I am a bit confused by “f(x,y) wrt to parameters w” Does that mean that f also takes “w” as input but it is not explicit (like a model created with an nn.Module in pytorch)?

Also I don’t get this “now has only the dimension of x, and not y”, could you clarify please?

Hey,

so let’s assume I just want linear regression,

input: x, output: y
y = f(x,W) = x*W

loss(f(x,w), y)

now dloss(x,y) / dw has dimension |x| and not |x|+1 what I thought it should be

maybe I’m just confusing something

Ok.
So you don’t actually want the gradient for x, only W right?

I think the +1 difference in the size comes from the bias term that is often added in linear regression that you are missing when you do x*W only.
You can either have another parameter of size 1 b and compute x*W + b. And now you get gradient for both W and b (this is the recommended way to do with NN-based libraries like pytorch).

Another approach that is used in more classical ML, is to add one more element to x in your function and have W be of size |x|+1:
extended_x = torch.cat([x, torch.tensor([1.])], dim=0) and then y = extended_x * W.

Hope this helps !

Hey, one more question:

so I’m now using torch.autograd.grad(output=criterion(y, x), model.parameters()) and somehow the dimension is still X-1 (so w/o bias in your example, I just assumed x0=1 and have the bias included inside the W, sorry)… any tip?

That would depend how and when you add this new feature.
Can you share your code?
If you use the model version, you most likely have 2 paremeters? One of size X-1 and one of size 1.