What is 2 in "grad_y_pred = 2*(y_pred - y)" in gradient calculation

Hello guys, I am new to PyTorch.I am going through an example where I found this code

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2*(y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

I didn’t understand at least one line of it. So please can anyone help me by explaining clearly. I tried some reference books where everyone said that this code is out of context.
My questions are:
1)why we used 2 in the first line of code i.e" grad_y_pred = 2*(y_pred - y)"
2)In the code above, what does grad_h[h < 0] = 0 signify?
3)what is the use of grad_h = grad_h_relu.clone()??
4)Entire code with significance of each line in one point please .
Thanks in advance


This code is a bit out of context. Where did you get it from?
So I am going to guess that you try to do the backward pass by hand for a network whole loss function is loss = ||y_pred - y||^2 with weights w1 and w2. And y_pred = w2 * h_relu and h_relu = relu(w1 * x).

If this is true, then the code above computes the backward pass using the chain rule by hand.
With each grad_xxx being the derivative of the loss wrt xxx. You can write the gradient of each formula above and you will get these expressions.

The .clone() and [h<0] are here because grad_h[h < 0] = 0 changes grad_h inplace to set all the values where the inputs were negative to 0 (the backward of the relu). But because they don’t want to change grad_h_relu inplace (that contains the derivative of the output of the relu). They .clone() that before.

1 Like

In this book i found that the above code is out of context

Thank you so much sir for answering question.
I joined a course where my instructor provided this code .There i found this code.
So here every line calculates gradients of different variables involved in forward propagation.Is it right sir??

Yes it’s right. And it reuses the gradients from the previous variables to do so. That is the backpropagation algorithm.

1 Like