I expect a neural network predict a value and the derivation of value.Is the following code the correct way?

```
import torch
from torch import nn
from torch.autograd import grad
class net(nn.Module):
def __init__(self):
super(net, self).__init__()
self.lin1 = nn.Linear(3, 30)
self.lin2 = nn.Linear(30, 1)
def forward(self, p):
x = self.lin1(p)
x = nn.ReLU()(x)
return self.lin2(x)
x = torch.randn(1000, 3)
y = (5 * torch.sin(x) + 3 * torch.cos(x)).sum(dim=-1).unsqueeze(-1)
z = (5 * torch.cos(x) - 3 * torch.sin(x)).sum(dim=-1).unsqueeze(-1)
model = net()
optimizer = torch.optim.Adam(model.parameters(), lr=3e-3)
for epoch in range(10000):
model.train()
x.requires_grad = True
optimizer.zero_grad()
output = model(x)
grad_x = grad(output.sum(), x, retain_graph=True)[0]
loss_y = nn.MSELoss()(output, y)
loss_z = nn.MSELoss()(grad_x.sum(dim=-1).unsqueeze(-1), z)
loss = loss_y + loss_z
loss.backward(retain_graph=True)
optimizer.step()
print('Loss_y = {:.4f} | Loss_z = {:.4f}.'.format(loss_y.item(), loss_z.item()))
```

I check the grad_fn of variable `loss_z`

,find `loss_y.grad_fn = <MseLossBackward object at 0x0000024F2AB8DF98>`

,but `loss_z.grad_fn = None`

.So although `loss_z`

decreases,this means the loss of the derivation of output doesnâ€™t participate in the gradient decent.Maybe just the model predicts `y`

very well,so it can predict `z`

well.If the dataset is not as easy as this form,`loss_z`

even doesnâ€™t decrease.

So how to predict the derivation of the output correctly?