I am building a multi agent reinforcement learning using ddpg where each agent has parameterized action spaces. If some action is chosen, the other low level action parameters become useless. In case of that, I don’t want policy network to be trained, so that I thought if the output value is detached, It won’t be trained. But DDPG is using sample batch. Is it possible to customize backward process by myself?