Hello guys, I am new to PyTorch.I am going through an example where I found this code
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2*(y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
I didn’t understand at least one line of it. So please can anyone help me by explaining clearly. I tried some reference books where everyone said that this code is out of context.
My questions are:
1)why we used 2 in the first line of code i.e" grad_y_pred = 2*(y_pred - y)"
2)In the code above, what does grad_h[h < 0] = 0 signify?
3)what is the use of grad_h = grad_h_relu.clone()??
4)Entire code with significance of each line in one point please .
Thanks in advance
This code is a bit out of context. Where did you get it from?
So I am going to guess that you try to do the backward pass by hand for a network whole loss function is loss = ||y_pred - y||^2 with weights w1 and w2. And y_pred = w2 * h_relu and h_relu = relu(w1 * x).
If this is true, then the code above computes the backward pass using the chain rule by hand.
With each grad_xxx being the derivative of the loss wrt xxx. You can write the gradient of each formula above and you will get these expressions.
The .clone() and [h<0] are here because grad_h[h < 0] = 0 changes grad_h inplace to set all the values where the inputs were negative to 0 (the backward of the relu). But because they don’t want to change grad_h_relu inplace (that contains the derivative of the output of the relu). They .clone() that before.
Thank you so much sir for answering question.
I joined a course where my instructor provided this code .There i found this code.
So here every line calculates gradients of different variables involved in forward propagation.Is it right sir??