What is the best way to optimize \theta for the the following loss function:

I tried the following but does not work:

```
optimizer.zero_grad()
Input.requires_grad_()
Output = Model(Input)
Output_max = Output[0,target1]
Output_max.backward(retain_graph = True)
loss = criterion(Input.grad, target)
loss.backward()
optimizer.step()
```

I get the following error for loss.backword()

```
element 0 of tensors does not require grad and does not have a grad_fn
```