RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation:at version 12; expected version 10 instead

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 4, 84, 84]] is at version 12; expected version 10 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I try to use A2C with 16 workers to play Atari game. How can I debug such error?

Here is the stack information:

Traceback (most recent call last):
File “C:/Users/Jiyao/PycharmProjects/RLs/PolicyGradient/A2C/run_atari.py”, line 61, in
train()
File “C:/Users/Jiyao/PycharmProjects/RLs/PolicyGradient/A2C/run_atari.py”, line 53, in train
agent.learn(i_step)
File “C:\Users\Jiyao\PycharmProjects\RLs\PolicyGradient\A2C\a2c.py”, line 90, in learn
self.memo.total_entropy.backward()
File “C:\Users\Jiyao.conda\envs\baselines\lib\site-packages\torch_tensor.py”, line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “C:\Users\Jiyao.conda\envs\baselines\lib\site-packages\torch\autograd_init_.py”, line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16, 4, 84, 84]] is at version 12; expected version 10 instead. Hint:

and [torch.FloatTensor [16, 4, 84, 84]] is the dimension of the images I feed into the network.

Below is my training code:

The network:

The Agent: