I want to train two agents who play a game (in the game theory sense) through gradient ascent. Each agent has a number of parameters identical to their number of strategies. These parameters are initialized with normally distributed random variables. The parameters get translated into a probability distribution by applying a softmax. I am aware of `torch.optim`

, but want to write the gradient ascent step myself, since I later want to investigate modifications to it. The code below works as expected:

```
import torch as T
lr = 0.01
std = 0.01
num_strategies1 = 3
num_strategies2 = 5
# Generate random payoff matrices for both players.
payoff_matrix1 = T.randn((num_strategies1, num_strategies2))
payoff_matrix2 = T.randn((num_strategies2, num_strategies1))
params1 = std * T.randn(num_strategies1)
params2 = std * T.randn(num_strategies2)
params1.requires_grad_()
params2.requires_grad_()
for _ in range(10):
probs1 = T.softmax(params1, dim=0)
probs2 = T.softmax(params2, dim=0)
payoff1 = T.dot(probs1, T.matmul(payoff_matrix1, probs2))
payoff2 = T.dot(probs2, T.matmul(payoff_matrix2, probs1))
grad1 = T.autograd.grad(payoff1, params1, create_graph=True)[0]
grad2 = T.autograd.grad(payoff2, params2, create_graph=True)[0]
params1.data += lr * grad1.data # GRADIENT ASCENT
params2.data += lr * grad2.data # GRADIENT ASCENT
```

Instead of using `.data`

, I want to use the `torch.no_grad()`

context manager. So I replace the two lines marked with `GRADIENT ASCENT`

with the following code:

```
payoff1.backward(retain_graph=True)
with T.no_grad():
params1 += lr * grad1
params2.grad.zero_()
payoff2.backward()
with T.no_grad():
params2 += lr * grad2
params1.grad.zero_()
params2.grad.zero_()
```

However, I now get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I’ve never had this issue before when using `torch.no_grad()`

. Surprisingly, the issue also disappears if I replace `probs1 = T.softmax(params1, dim=0)`

with a meaningless `probs1 = T.sigmoid(params1)`

(and the same for `probs2`

).

How do I need to change my code such that I can use `torch.no_grad()`

?