# What is 2 in "grad_y_pred = 2*(y_pred - y)" in gradient calculation

Hello guys, I am new to PyTorch.I am going through an example where I found this code

``````    # Backprop to compute gradients of w1 and w2 with respect to loss

``````

I didn’t understand at least one line of it. So please can anyone help me by explaining clearly. I tried some reference books where everyone said that this code is out of context.
My questions are:
1)why we used 2 in the first line of code i.e" grad_y_pred = 2*(y_pred - y)"
2)In the code above, what does `grad_h[h < 0] = 0` signify?
3)what is the use of `grad_h = grad_h_relu.clone()`??
4)Entire code with significance of each line in one point please .

Hi,

This code is a bit out of context. Where did you get it from?
So I am going to guess that you try to do the backward pass by hand for a network whole loss function is `loss = ||y_pred - y||^2` with weights `w1` and `w2`. And `y_pred = w2 * h_relu` and `h_relu = relu(w1 * x)`.

If this is true, then the code above computes the backward pass using the chain rule by hand.
With each `grad_xxx` being the derivative of the loss wrt xxx. You can write the gradient of each formula above and you will get these expressions.

The `.clone()` and `[h<0]` are here because `grad_h[h < 0] = 0` changes `grad_h` inplace to set all the values where the inputs were negative to 0 (the backward of the relu). But because they don’t want to change `grad_h_relu` inplace (that contains the derivative of the output of the relu). They `.clone()` that before.

1 Like

Thank you so much sir for answering question.
I joined a course where my instructor provided this code .There i found this code.
So here every line calculates gradients of different variables involved in forward propagation.Is it right sir??

Yes it’s right. And it reuses the gradients from the previous variables to do so. That is the backpropagation algorithm.

1 Like