I’m getting the following error related to types for a DQN reinforcement learning example, where I’m using the camera inputs from a ROS node:
File "/project/ros-kinetic-alphapilot/catkin_ws/src/alphapilot_openai_ros/ardrone_race_track/src/ardrone_v1_ddqn.py", line 433, in optimize_model
expected_state_action_values = (next_state_values * GAMMA) + reward_batch
RuntimeError: expected type torch.cuda.FloatTensor but got torch.cuda.LongTensor
# Compute V(s_{t+1}) for all next states.
# Expected values of actions for non_final_next_states are computed based
# on the "older" target_net; selecting their best reward with max(1)[0].
# This is merged based on the mask, such that we'll have either the expected
# state value or 0 in case the state was final.
next_state_values = torch.zeros(BATCH_SIZE, device=device)
next_state_values[non_final_mask] = target_net(non_final_next_states).max(1)[0].detach()
# Compute the expected Q values
expected_state_action_values = (next_state_values * GAMMA) + reward_batch
How can I fix this?