Loss function depends on the output gradients

I would like calculate the loss in the following form:
stackoflow
where u_bc and \hat{u}_bc are the predicted and exact values of x_1, u’‘_r and \hat{u}’'_r are the predicted and exact second derivatives of the output from x_2. x_1 and x_2 are different samples.
I am trying to implement in the following way:

# forward pass to calculate the first loss component
u_bc_pred = self.forward(self.X_u)
loss_bcs = self.loss_fnc(u_bc_pred, self.Y_u)

# the second loss component involves the second derivative of output
u_r_pred = self.forward(self.X_r)
u_x = torch.autograd.grad(u_r_pred, self.X_r, torch.ones_like(u_r_pred), retain_graph=True, create_graph=True)[0] / self.sigma_x
u_xx = torch.autograd.grad(u_x, self.X_r, torch.ones_like(u_x), retain_graph=True, create_graph=False)[0] / self.sigma_x
loss_res = self.loss_fnc(u_xx, self.Y_r)

# Total loss
loss = loss_res + loss_bcs
self.loss_log.append(loss.data)

# backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()

It does not seem that my code is decreasing u_xx loss. Is there anything wrong with the way I write the second loss term involving solution gradients?
Can someone please help take a look?
Thanks a lot!