Hi everyone, I’m working on two link robot arm control using DDPG. My custom environment is like reacher in Mujoco environment and I’m trying to train my robot arm fingertip to follow specific trajectory (i.e. circle). I’m having problems with convergence to good policy, so I’m using something like DDPGfD, I extracted wanted state trajectories and actions from some other file using Lagrange - Euler method. The problem is that I’m having a inplace operation error.
C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\autograd_init_.py:266: UserWarning: Error detected in MmBackward0. Traceback of forward call that caused the error:
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 332, in
main()
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 260, in main
action = get_action(actor, state)
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 181, in get_action
action = _actor(state)
File “C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1520, in _call_impl
return forward_call(*args, **kwargs)
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 153, in forward
out = 10.0 * F.tanh(self.fc_out(out))
File “C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\nn\modules\module.py”, line 1520, in _call_impl
return forward_call(*args, **kwargs)
File “C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\nn\modules\linear.py”, line 116, in forward
return F.linear(input, self.weight, self.bias)
(Triggered internally at …\torch\csrc\autograd\python_anomaly_mode.cpp:118.)
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 332, in
main()
File “c:\Users\user\Desktop\project\sym_robot_arm\sample_two_link\04-30 DDPG\twolink_DDPG_trial.py”, line 298, in main
update_critic(state_batch, action_batch, Q_target_batch)
torch.autograd.backward(
File "C:\Users\user\Desktop\project\venv\Lib\site-packages\torch\autograd_init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 2]], which is output 0 of AsStridedBackward0, is at version 219897; expected version 219896 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I know what inplace operation is, I’ve encountered them already when I was trying to implement DDPG. But what makes me frustrated is that, it didn’t make inplace operation error when I was adding noise to the action.
action = get_action(actor, state)
if episode < cfg.max_explore_eps:
p = episode / cfg.max_explore_eps
action = action.detach() + 1.5 * (1 - p) * next(noise)
My action selection code is like this, and I kind of separated episodes into exploration_eps and exploitation_eps. In exploration_eps agents can learn some rules in dynamics, some basic policies to follow the target trajectory, and I wanted to finetune or shape this policy to accurate, good one in exploitation_eps. But after exploration_eps, pytorch gives me this kind of error. The only thing changed is elmination of ounoise to action, but this gives me inplace operation error. Does anyone know why?
Also, I plotted some rewards and rendered my robot arm to see if it is learning well. I found out in early stages the average total reward during eps incrementally increased, but after half exploration_eps, (when noise scale decreases) the performance decrease dramatically. Sometime good score, some times really bad score. This kind of result makes me think that noise is actually helping agent to give good performance. Does anyone can tell me why?
For anyone who want to see whole code, here’s my github repository. Main code is twolink_DDPG_trial
Also I would appreciate it if you can recommend me some other algorithm to control two link robot arm.