Hi!

I am trying to solve a PDE using initial values and knowledge on to PDE to construct the losses. It is more or less copied from a tensorflow code, yet I get different results, and so I am inclined to believe that something weird is going on when backpropogating across the same graph multiple times, then optimizing accross these as well.

In the neural net:

```
x.grad = None
u.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
u_x = x.grad
# retain_graph : allows backprop through same variable again all derivatives need this
# create graph : makes it so that x.grad has grad_fn
#x.grad = None
u_x.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
u_xx = x.grad
x.grad = None
v.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
v_x = x.grad
# x.grad = None
v_x.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
v_xx = x.grad
t.grad = None
u.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
u_t = t.grad
t.grad = None
v.backward(torch.ones((batch_size, 1)), retain_graph=True, create_graph=True)
v_t = t.grad
f_u = u_t.float() + 0.5 * v_xx.float() + (u ** 2 + v ** 2) * v
f_v = v_t.float() - 0.5 * u_xx.float() - (u ** 2 + v ** 2) * u
```