Categorical(probs).sample() generates RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0)

Hi,
I’m working on an adaptation of the pytorch actor_critic_py for an RRBot example within an OpenAI ROS Kinetic Gazebo 7 environment.

  def select_action(self, state):
    state = torch.from_numpy(state).float()
    probs, state_value = self.model(state)
    m = Categorical(probs)
    action = m.sample()
    self.model.saved_actions.append(self.saved_action(m.log_prob(action), state_value))
    return action.item()

At some point, either during initialization or when the RRBot swing up task is approximately in this state during simulation:

I consistently get the following run-time error:

[WARN] [1539704325.305267, 1074.472000]: PUBLISHING REWARD...
[WARN] [1539704325.305413, 1074.472000]: PUBLISHING REWARD...DONE=0.0,EP=13
Traceback (most recent call last):
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 204, in <module>
    main()
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 125, in main
    action = agent.select_action(state)
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 64, in select_action
    action = m.sample()
  File "/usr/local/lib/python2.7/dist-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:297
[DEBUG] [1539704325.306117, 1074.472000]: END Reseting RobotGazeboEnvironment

@ptrblck Any thoughts on what might be causing this? I tried changing the learning rate, but it still crashes with the above message.

That’s a strange issue, as the probs passed to Categorical are created using F.softmax(action_scores, dim=-1). Could you check if the dimension is set properly for your action_scores? F.softmax should not return negative values.
Also, could you add a print statement of action_states and probs just for the sake of debugging?

1 Like

I had a similar prob for some reinforcement learning prob. the reason with me was: the softmax had turned into a vector of lovely NaNs. then categorical fails with above error.

Were the values passed to softmax already NaNs or did the softmax op created them?

So could you solve this prob?

Did anyone solve this problem?

Do you see the same initial error message or do you encouter NaN values?
As the error message states, negative probabilities are not supported.

CC @cuiguangwu

1 Like
def getAction2(self, state):
        state = torch.FloatTensor(state) 
        logits, _ = self.model2.forward(state)
        dist = F.softmax(logits, dim = -1)
        self.check2.append(dist)
        probs = Categorical(dist)
        return probs.sample()

i have collected the output of the softmax at the point where the error is raised in the code. It has generated Nan values.
The error:


  File "<ipython-input-12-986f834e9152>", line 1, in <module>
    runfile('C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py', wdir='C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents')

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 406, in <module>
    generateEpisode(x)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 337, in generateEpisode
    action = agent.getAction2(state)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 179, in getAction2
    return probs.sample()

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

checking for the output of the softmax and I have already made an agent instance.

In [13]: agent.check2
Out[13]: [tensor([nan, nan], grad_fn=<SoftmaxBackward>)]

so the softmax is creating the Nans.

The nans are not generated in by softmax. The model itself is generating the nan because of the exploding gradients due to the learning rate.
I tried to check my network weights. It has generated nan

In [29]: for param in agent.model1.parameters():
    print(param.data)
    
tensor([[    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [-0.7366, -0.1832, -0.1841,  0.3141,  0.0334, -0.0575, -0.0015,  0.0069,
         -0.1040],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan]])
tensor([   nan,    nan,    nan,    nan, 0.1520,    nan,    nan,    nan,    nan,
           nan])
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
tensor([nan, nan])
tensor([[ 4.5539e+00, -4.2161e+00,  2.1670e-01, -2.9069e+00, -4.2264e+00,
         -4.5930e+00, -3.9898e+00,  4.3225e+00,  4.2810e+00],
        [ 4.4970e+00, -4.4960e+00,  3.8166e+00, -2.9493e+00, -4.6124e+00,
         -4.6803e+00, -3.9606e+00,  4.0320e+00,  4.2299e+00],
        [-3.0815e-01, -8.5769e-02,  3.1897e-01,  3.0792e-01, -7.2797e-02,
          1.2266e-02,  2.3653e-01,  2.4632e-01,  3.8586e-02],
        [-8.0415e-02,  9.2836e-02,  3.7588e-01,  3.3804e-01,  1.5777e-02,
          7.4958e-02, -6.0354e-02,  8.1592e-02, -3.8448e-01],
        [-1.4028e-01,  1.0577e-01,  3.7370e-01,  2.5323e-01,  1.0640e-01,
         -1.5946e-01,  1.5165e-01, -1.5983e-04,  1.1991e-01],
        [ 3.5704e-02,  8.9192e-02, -4.2397e-02, -1.8162e-01,  2.7302e-02,
         -8.2681e-02,  2.4023e-01,  1.8748e-01, -3.6148e-01],
        [ 4.4627e+00, -4.1383e+00,  1.0344e+00, -2.9011e+00, -4.5565e+00,
         -4.4978e+00, -3.5755e+00,  4.2879e+00,  4.3042e+00],
        [ 4.0489e+00, -3.4591e+00,  1.2957e+00, -2.6597e+00, -3.5383e+00,
         -3.4929e+00, -3.4063e+00,  3.7202e+00,  4.0741e+00],
        [-5.6180e-02,  3.9072e-02, -5.6076e-02,  3.0225e-01, -9.5747e-02,
          1.5115e-01,  1.0766e-02,  2.7571e-01, -3.0291e-01],
        [ 4.2704e+00, -4.1625e+00,  9.1064e-01, -3.1105e+00, -4.2028e+00,
         -4.3451e+00, -3.8066e+00,  4.3735e+00,  4.4310e+00]])
tensor([ 4.3880,  3.9937, -0.0354, -0.0842, -0.1234, -0.2485,  4.0131,  4.0434,
        -0.3865,  4.1969])
tensor([[ 4.3842,  4.5874, -0.0854, -0.1604, -0.2546, -0.0998,  4.6371,  4.2951,
         -0.0997,  4.7081]])
tensor([3.3452])

The problem here I think is the exploding or vanishing gradient problem. I am trying to do gradient clipping to get the graident calculated without exploding.

Hey did you have success? I have the same problem.

How to clip the gradients with pytorch?

You could use torch.nn.utils.clip_grad_norm_ or torch.nn.utils.clip_grad_value_.

Great thanks :slight_smile:

I currently use it like this:

        self.optimizer.zero_grad()
        loss = self.criterion(log_action_probabilities, rewards)
        loss.backward()
        # clipping to prevent nans:
        # see https://discuss.pytorch.org/t/proper-way-to-do-gradient-clipping/191/6
        torch.nn.utils.clip_grad_norm_(self.parameters(), 5)
        self.optimizer.step()
        self.log_action_probabilities.clear()
        self.rewards.clear()

With the clipping I try to get rid of this error:


    action_idx = distribution.sample()      #
  File "/home/markus/Documents/06_Software_Projects/mcts/mcts_env/lib/python3.6/site-packages/torch/distributions/categorical.py", line 107, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

Is this the right case / way to use the clipping?

Does anyone know other methods of handling the above error such as: Using a different network (did not help for me), using other optimizers, other loss functions?

My current problem is that without a lr of lower than 0.1 my algorithm seems to learn nothing. But with this high lr I get the above error. Clipping in the way above does not help:

    def forward(self, state: torch.tensor, legalCards: torch.tensor):
        state = state.resize_(180)
        probs = self.network(torch.FloatTensor(state))
        probs = probs * legalCards
        distribution = Categorical(probs)
        print(probs)
        print(distribution)            #
        action_idx = distribution.sample()      #
        log_action_probability = distribution.log_prob(action_idx)
        self.log_action_probabilities.append(log_action_probability)

Will produce this output after some time:

tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 5.1009e-16, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 5.0108e-22, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
       grad_fn=<MulBackward0>)
Categorical(probs: torch.Size([60]))



tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       grad_fn=<MulBackward0>)
Categorical(probs: torch.Size([60]))

Maybe the problem is that probs goes to zero…

1 Like

I’m not really familiar with RL, but could you clip probs_2d to valid probabilities, i.e. avoid negative values?

The usage of grad clipping looks alright.

Hm sry I am quite new what exactly do you mean by clip probs_2d ??? How would I do that?

like this?:


        probs = self.network(torch.FloatTensor(state))
        probs = probs * legalCards
        probs =   torch.nn.utils.clip_grad_norm_(probs , 5)
        distribution = Categorical(probs)

I meant something like torch.clamp(probs_2d, 0, 1), which would propagate the gradient for values inside the interval. But as I said, I’m not sure if that’s a valid approach for your method.

1 Like