Hi,

I have one problem for the gradient calculation in pytorch:

I have two models which have following calculate:

y1 = model1(w1,x)

Here y1 is a vector, w1 is a matrix,

y2 = model2(w2,y1)

Here y2 is a scalar

I have obtained the gradient of following in pytorch:

dy2/dy1 which is a vector, and dy1/dw1 which is a matrix,

I must use this two gradients explicitly to complete the chain rule,

how can I calculate the gradient of dy2/dw1 in pytorch?

Thanks