Categorical(probs).sample() generates RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0)

Hi,
I’m working on an adaptation of the pytorch actor_critic_py for an RRBot example within an OpenAI ROS Kinetic Gazebo 7 environment.

  def select_action(self, state):
    state = torch.from_numpy(state).float()
    probs, state_value = self.model(state)
    m = Categorical(probs)
    action = m.sample()
    self.model.saved_actions.append(self.saved_action(m.log_prob(action), state_value))
    return action.item()

At some point, either during initialization or when the RRBot swing up task is approximately in this state during simulation:

I consistently get the following run-time error:

[WARN] [1539704325.305267, 1074.472000]: PUBLISHING REWARD...
[WARN] [1539704325.305413, 1074.472000]: PUBLISHING REWARD...DONE=0.0,EP=13
Traceback (most recent call last):
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 204, in <module>
    main()
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 125, in main
    action = agent.select_action(state)
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 64, in select_action
    action = m.sample()
  File "/usr/local/lib/python2.7/dist-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:297
[DEBUG] [1539704325.306117, 1074.472000]: END Reseting RobotGazeboEnvironment

@ptrblck Any thoughts on what might be causing this? I tried changing the learning rate, but it still crashes with the above message.

That’s a strange issue, as the probs passed to Categorical are created using F.softmax(action_scores, dim=-1). Could you check if the dimension is set properly for your action_scores? F.softmax should not return negative values.
Also, could you add a print statement of action_states and probs just for the sake of debugging?

I had a similar prob for some reinforcement learning prob. the reason with me was: the softmax had turned into a vector of lovely NaNs. then categorical fails with above error.

Were the values passed to softmax already NaNs or did the softmax op created them?