I’m a little new to PyTorch so I realize I am not quite using these methods correctly. I want y.grad to be the gradient dw/dy and the .grad property for all other variables, x_i, to be dz/dx_i. Does that make sense?

If both w and z are used to compute y and x_i, you will need to do two backward passes. One to get the gradients wrt to w and one to get the gradients wrt to z. You can either use the autograd.grad method to state explicitly the gradient of which element you want or simply save the .grad field of the variables you care about and ignore the other ones.

Thanks for the response. At this point, it might be clarifying to point you to a thread that I started after this one that has a few more details on my problem: Higher-order gradients w.r.t. different functions.
I am currently doing two backward passes, but in principle it seems like I should be able to do just one.