I want to train two agents who play a game (in the game theory sense) through gradient ascent. Each agent has a number of parameters identical to their number of strategies. These parameters are initialized with normally distributed random variables. The parameters get translated into a probability distribution by applying a softmax. I am aware of torch.optim
, but want to write the gradient ascent step myself, since I later want to investigate modifications to it. The code below works as expected:
import torch as T
lr = 0.01
std = 0.01
num_strategies1 = 3
num_strategies2 = 5
# Generate random payoff matrices for both players.
payoff_matrix1 = T.randn((num_strategies1, num_strategies2))
payoff_matrix2 = T.randn((num_strategies2, num_strategies1))
params1 = std * T.randn(num_strategies1)
params2 = std * T.randn(num_strategies2)
params1.requires_grad_()
params2.requires_grad_()
for _ in range(10):
probs1 = T.softmax(params1, dim=0)
probs2 = T.softmax(params2, dim=0)
payoff1 = T.dot(probs1, T.matmul(payoff_matrix1, probs2))
payoff2 = T.dot(probs2, T.matmul(payoff_matrix2, probs1))
grad1 = T.autograd.grad(payoff1, params1, create_graph=True)[0]
grad2 = T.autograd.grad(payoff2, params2, create_graph=True)[0]
params1.data += lr * grad1.data # GRADIENT ASCENT
params2.data += lr * grad2.data # GRADIENT ASCENT
Instead of using .data
, I want to use the torch.no_grad()
context manager. So I replace the two lines marked with GRADIENT ASCENT
with the following code:
payoff1.backward(retain_graph=True)
with T.no_grad():
params1 += lr * grad1
params2.grad.zero_()
payoff2.backward()
with T.no_grad():
params2 += lr * grad2
params1.grad.zero_()
params2.grad.zero_()
However, I now get the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I’ve never had this issue before when using torch.no_grad()
. Surprisingly, the issue also disappears if I replace probs1 = T.softmax(params1, dim=0)
with a meaningless probs1 = T.sigmoid(params1)
(and the same for probs2
).
How do I need to change my code such that I can use torch.no_grad()
?