I’ve adapted the PyTorch DQN tutorial to take inputs from a ROS camera topic, and use that as the observation to create an ARDrone DDQN example, using the Gazebo 7 simulator, but I’m getting a crash.
I’ve debugged the image, to ensure that the rescaled image is working properly before feeding it to the model:
I’ve set the following hyperparameters:
batch_size: 128
target_network_update_interval: 2
It throws up the following error, at line 192:
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_start_training_ddqn.py", line 423, in <module>
main()
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_start_training_ddqn.py", line 398, in main
agent.train()
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_start_training_ddqn.py", line 237, in train
self.optimize_model()
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_start_training_ddqn.py", line 192, in optimize_model
next_state_values[non_final_mask] = self.target_net(non_final_next_states).max(1)[0].detach()
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/model/dqn/dqn.py", line 103, in forward
x = F.relu(self.bn3(self.conv3(x)))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 339, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
[INFO] [1553228217.202863, 5023.352000]: Shutting down node: ardrone_v1_goto_ddqn
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [32,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [33,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
<snip>
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [125,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [127,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
Process finished with exit code 1
I using python-2.7 on Ubuntu-16.04. Here are the versions of the libraries.
PYTORCH_VERSION='nightly'
CUDA_VERSION='9.0'
CUDNN_VERSION='7.3.1.20'