Using torch.no_grad() gives RuntimeError about modifying variable by an inplace operation

I want to train two agents who play a game (in the game theory sense) through gradient ascent. Each agent has a number of parameters identical to their number of strategies. These parameters are initialized with normally distributed random variables. The parameters get translated into a probability distribution by applying a softmax. I am aware of torch.optim, but want to write the gradient ascent step myself, since I later want to investigate modifications to it. The code below works as expected:

import torch as T

lr = 0.01
std = 0.01
num_strategies1 = 3
num_strategies2 = 5
# Generate random payoff matrices for both players.
payoff_matrix1 = T.randn((num_strategies1, num_strategies2))
payoff_matrix2 = T.randn((num_strategies2, num_strategies1))

params1 = std * T.randn(num_strategies1)
params2 = std * T.randn(num_strategies2)
params1.requires_grad_()
params2.requires_grad_()

for _ in range(10):
  probs1 = T.softmax(params1, dim=0)
  probs2 = T.softmax(params2, dim=0)
  payoff1 = T.dot(probs1, T.matmul(payoff_matrix1, probs2))
  payoff2 = T.dot(probs2, T.matmul(payoff_matrix2, probs1))
  grad1 = T.autograd.grad(payoff1, params1, create_graph=True)[0]
  grad2 = T.autograd.grad(payoff2, params2, create_graph=True)[0]
  params1.data += lr * grad1.data # GRADIENT ASCENT
  params2.data += lr * grad2.data # GRADIENT ASCENT  

Instead of using .data, I want to use the torch.no_grad() context manager. So I replace the two lines marked with GRADIENT ASCENT with the following code:

  payoff1.backward(retain_graph=True)
  with T.no_grad():
    params1 += lr * grad1
    params2.grad.zero_()

  payoff2.backward()
  with T.no_grad():
    params2 += lr * grad2
    params1.grad.zero_()
    params2.grad.zero_()     

However, I now get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I’ve never had this issue before when using torch.no_grad(). Surprisingly, the issue also disappears if I replace probs1 = T.softmax(params1, dim=0) with a meaningless probs1 = T.sigmoid(params1) (and the same for probs2).

How do I need to change my code such that I can use torch.no_grad()?