So, I have this function:

```
def ode_residual(self,
time: th.DoubleTensor,
state: th.DoubleTensor) -> th.DoubleTensor:
dtheta = th.autograd.grad(state.sum(), time, retain_graph=True)[0]
d2theta = th.autograd.grad(dtheta.sum(), time, retain_graph=True)[0]
```

where `time`

and 2 tensors are concatenated as an input to an MLP. `time`

has requires_grad set to True and the other 2 variables do not. The shape of `time`

is `(b, 1)`

and the other 2 tensors have size `(b, 2)`

. The MLP outputs a tensor that is sliced to 3 tensors. `state`

is one of these tensors and has a shape of `(b, 1)`

where b is the batch size.

```
class MLP:
def __init__(self,
width: int,
depth: int,
pretrained:bool = False,
dtype=th.double) -> None:
super().__init__()
self.layers = [nn.Linear(1+2+2, width, dtype=dtype)]
self.layers.extend([nn.Linear(width, width, dtype=dtype) for _ in range(depth)])
self.layers.append(nn.Linear(width, 1+ 1+ 2, dtype=dtype))
self.layers = nn.Sequential(*self.layers)
def output_to_states(self, output) -> Tuple[th.DoubleTensor, th.DoubleTensor, th.DoubleTensor]:
"""
Returns:
state
tensor 2
tensor 3
"""
return output[:, 0:1], output[:, 1:2], output[:, 2:]
def forward(self,
t: th.DoubleTensor,
x0: th.DoubleTensor,
x1: th.DoubleTensor) -> th.DoubleTensor:
x = th.cat([t, x0, x1], dim=1)
return self.layers(x)
```

First, in the code above, `dtheta`

returns in a tensor of shape `(b, 1)`

that has all entries be the same value. Why is this? Shouldnâ€™t all batch entries be independent and thus even thought we have a summation, the gradients will be different for each batch. Also, calculating for `dtheta2`

results in the following error even though I set `retain_graph`

to True. Any help in answering these question would be appreciated.

```
File "c:\path\to\project\core\dynamics.py", line 93, in ode_residual
d2theta = th.autograd.grad(dtheta.sum(), time, retain_graph=True)[0]
File "C:\path\to\project\.venv\lib\site-packages\torch\autograd\__init__.py", line 303, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
```