Hello guys, I am new to PyTorch.I am going through an example where I found this code
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2*(y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
I didn’t understand at least one line of it. So please can anyone help me by explaining clearly. I tried some reference books where everyone said that this code is out of context.
My questions are:
1)why we used 2 in the first line of code i.e" grad_y_pred = 2*(y_pred - y)"
2)In the code above, what does
grad_h[h < 0] = 0 signify?
3)what is the use of
grad_h = grad_h_relu.clone()??
4)Entire code with significance of each line in one point please .
Thanks in advance
This code is a bit out of context. Where did you get it from?
So I am going to guess that you try to do the backward pass by hand for a network whole loss function is
loss = ||y_pred - y||^2 with weights
y_pred = w2 * h_relu and
h_relu = relu(w1 * x).
If this is true, then the code above computes the backward pass using the chain rule by hand.
grad_xxx being the derivative of the loss wrt xxx. You can write the gradient of each formula above and you will get these expressions.
[h<0] are here because
grad_h[h < 0] = 0 changes
grad_h inplace to set all the values where the inputs were negative to 0 (the backward of the relu). But because they don’t want to change
grad_h_relu inplace (that contains the derivative of the output of the relu). They
.clone() that before.
Thank you so much sir for answering question.
I joined a course where my instructor provided this code .There i found this code.
So here every line calculates gradients of different variables involved in forward propagation.Is it right sir??
Yes it’s right. And it reuses the gradients from the previous variables to do so. That is the backpropagation algorithm.