Using torch.no_grad() gives RuntimeError about modifying variable by an inplace operation

I want to train two agents who play a game (in the game theory sense) through gradient ascent. Each agent has a number of parameters identical to their number of strategies. These parameters are initialized with normally distributed random variables. The parameters get translated into a probability distribution by applying a softmax. I am aware of torch.optim, but want to write the gradient ascent step myself, since I later want to investigate modifications to it. The code below works as expected:

import torch as T

lr = 0.01
std = 0.01
num_strategies1 = 3
num_strategies2 = 5
# Generate random payoff matrices for both players.
payoff_matrix1 = T.randn((num_strategies1, num_strategies2))
payoff_matrix2 = T.randn((num_strategies2, num_strategies1))

params1 = std * T.randn(num_strategies1)
params2 = std * T.randn(num_strategies2)

for _ in range(10):
  probs1 = T.softmax(params1, dim=0)
  probs2 = T.softmax(params2, dim=0)
  payoff1 =, T.matmul(payoff_matrix1, probs2))
  payoff2 =, T.matmul(payoff_matrix2, probs1))
  grad1 = T.autograd.grad(payoff1, params1, create_graph=True)[0]
  grad2 = T.autograd.grad(payoff2, params2, create_graph=True)[0] += lr * # GRADIENT ASCENT += lr * # GRADIENT ASCENT  

Instead of using .data, I want to use the torch.no_grad() context manager. So I replace the two lines marked with GRADIENT ASCENT with the following code:

  with T.no_grad():
    params1 += lr * grad1

  with T.no_grad():
    params2 += lr * grad2

However, I now get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I’ve never had this issue before when using torch.no_grad(). Surprisingly, the issue also disappears if I replace probs1 = T.softmax(params1, dim=0) with a meaningless probs1 = T.sigmoid(params1) (and the same for probs2).

How do I need to change my code such that I can use torch.no_grad()?