RuntimeError: copy_if failed to synchronize: device-side assert triggered

edowson · March 24, 2019, 9:18pm

I’m getting the following errors with my code. It is an adapted version of the PyTorch DQN example.

/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [62,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
Traceback (most recent call last):
  File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_ddqn.py", line 548, in <module>
    optimize_model()
  File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_ddqn.py", line 451, in optimize_model
    next_state_values[non_final_mask] = target_net(non_final_next_states).max(1)[0].detach()
RuntimeError: copy_if failed to synchronize: device-side assert triggered

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_ddqn.py#L451


# columns of actions taken. These are the actions which would've been taken
# for each batch state according to policy_net
state_action_values = policy_net(state_batch).gather(1, action_batch)


# Compute V(s_{t+1}) for all next states.
# Expected values of actions for non_final_next_states are computed based
# on the "older" target_net; selecting their best reward with max(1)[0].
# This is merged based on the mask, such that we'll have either the expected
# state value or 0 in case the state was final.
next_state_values = torch.zeros(BATCH_SIZE, device=device)
next_state_values[non_final_mask] = target_net(non_final_next_states).max(1)[0].detach()
# Compute the expected Q values
expected_state_action_values = (next_state_values * GAMMA) + reward_batch.float()


# Compute Huber loss
loss = F.smooth_l1_loss(state_action_values, expected_state_action_values.unsqueeze(1))


# Optimize the model
optimizer.zero_grad()
loss.backward()
for param in policy_net.parameters():

The hyperparameters are as follows:

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/config/ardrone_v1_race_track_ddqn_params.yaml#L14-L29


# screen parameters
screen_height: 180
screen_width: 320


show_image: False


# ddqn parameters
gamma: 0.999


epsilon_start: 0.9
epsilon_end: 0.01
epsilon_decay: 500


batch_size: 128
replay_memory_size: 1024
target_network_update_interval: 10

edowson · March 25, 2019, 4:56am

I ran with device=cpu to debug the error, and the error is asserted at line 443:

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_ddqn.py#L440-L443


# Compute Q(s_t, a) - the model computes Q(s_t), then we select the
# columns of actions taken. These are the actions which would've been taken
# for each batch state according to policy_net
state_action_values = policy_net(state_batch).gather(1, action_batch)

  File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_ddqn.py", line 443, in optimize_model
    state_action_values = policy_net(state_batch).gather(1, action_batch)
RuntimeError: Invalid index in gather at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:457

@ptrblck would you happen to know how I can fix this?

The default cartpole example had 2 actions.

    else:
        return torch.tensor([[random.randrange(2)]], device=device, dtype=torch.long)

I noticed that it crashes, when I update it for my task environment, which has 7 actions.

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_ddqn.py#L379-L380


else:
    return torch.tensor([[random.randrange(N_ACTIONS)]], device=device, dtype=torch.long)

N_ACTIONS=7, in this case. If i set it to 2, there is no crash.

Why would this be an issue? I can’t seem to locate any other part of the code that hard-codes the total number of actions.

ptrblck · March 25, 2019, 12:43pm

Could you print the shape of policy_net(state_batch) and the min and max values of action_batch?
Some indices are apparently out of bounds for the gather operation.

edowson · March 25, 2019, 1:36pm

@ptrblck Here is the output.

print("policy_net(state_batch).shape: {}".format(policy_net(state_batch).shape))
print("state_batch.shape: {}".format(state_batch.shape))
print("action_batch: shape= {}, max= {}, min= {}".format(action_batch.shape, action_batch.max(), action_batch.min()))
state_action_values = policy_net(state_batch).gather(1, action_batch)

output:

policy_net(state_batch).shape: torch.Size([128, 2])
state_batch.shape: torch.Size([128, 3, 180, 320])
action_batch: shape= torch.Size([128, 1]), max= 6, min= 0

I have a total of 7 actions. Action values 0 to 6 are mapped for the following drone movements: FORWARDS, BACKWARDS, STRAFE_LEFT, ;STRAFE_RIGHT, UP, DOWN, STOP.

There are a total of 8 observations. x, y, z ,r, p, y, sonar_value, collision

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_goto_task_env.py#L39-L40


number_actions = rospy.get_param('/drone/n_actions')
self.action_space = spaces.Discrete(number_actions)

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_goto_task_env.py#L158-L180


if action == 0: #FORWARDS
    linear_speed_vector.x = self.linear_forward_speed
    self.last_action = "FORWARDS"
elif action == 1: #BACKWARDS
    linear_speed_vector.x = -1*self.linear_forward_speed
    self.last_action = "BACKWARDS"
elif action == 2: #STRAFE_LEFT
    linear_speed_vector.y = self.linear_forward_speed
    self.last_action = "STRAFE_LEFT"
elif action == 3: #STRAFE_RIGHT
    linear_speed_vector.y = -1*self.linear_forward_speed
    self.last_action = "STRAFE_RIGHT"
elif action == 4: #UP
    linear_speed_vector.z = self.linear_forward_speed
    self.last_action = "UP"
elif action == 5: #DOWN
    linear_speed_vector.z = -1*self.linear_forward_speed
    self.last_action = "DOWN"
elif action == 6: #STOP
    linear_speed_vector.x = 0.0

This file has been truncated. show original

I’m running it on the cpu, to debug, and it gives the following error:

  File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_ddqn.py", line 446, in optimize_model
    state_action_values = policy_net(state_batch).gather(1, action_batch)
RuntimeError: Invalid index in gather at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:457

edowson · March 25, 2019, 2:18pm

@ptrblck After viewing the shape of the policy_net(state_batch), it would appear that the number of outputs were hard-coded to 2.

github.com

edowson/alphapilot_openai_ros/blob/master/ardrone_race_track/src/ardrone_v1_ddqn.py#L289


    self.conv3 = nn.Conv2d(32, 32, kernel_size=5, stride=2)
    self.bn3 = nn.BatchNorm2d(32)


    # Number of Linear input connections depends on output of conv2d layers
    # and therefore the input image size, so compute it.
    def conv2d_size_out(size, kernel_size = 5, stride = 2):
        return (size - (kernel_size - 1) - 1) // stride  + 1
    convw = conv2d_size_out(conv2d_size_out(conv2d_size_out(w)))
    convh = conv2d_size_out(conv2d_size_out(conv2d_size_out(h)))
    linear_input_size = convw * convh * 32
    self.head = nn.Linear(linear_input_size, 2) # 448 or 512


# Called with either one element to determine next action, or a batch
# during optimization. Returns tensor([[left0exp,right0exp]...]).
def forward(self, x):
    x = F.relu(self.bn1(self.conv1(x)))
    x = F.relu(self.bn2(self.conv2(x)))
    x = F.relu(self.bn3(self.conv3(x)))
    return self.head(x.view(x.size(0), -1))

edowson · March 25, 2019, 6:20pm

@ptrblck I’ve submitted a pull request with updates to the reinforcement_q_learning.py tutorial. I’ve made the DQN network accept the number of outputs and updated the example to obtain the number of actions from the gym environment action space. This will help avoid similar issues for others who my try the DQN example with different gym environments.

ptrblck · March 26, 2019, 12:41am

Yeah, it looks like the hard-coded number of outputs creates this issue. Thanks for the PR and the fix!