Categorical(probs).sample() generates RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0)

edowson · October 16, 2018, 3:48pm

Hi,
I’m working on an adaptation of the pytorch actor_critic_py for an RRBot example within an OpenAI ROS Kinetic Gazebo 7 environment.

  def select_action(self, state):
    state = torch.from_numpy(state).float()
    probs, state_value = self.model(state)
    m = Categorical(probs)
    action = m.sample()
    self.model.saved_actions.append(self.saved_action(m.log_prob(action), state_value))
    return action.item()

At some point, either during initialization or when the RRBot swing up task is approximately in this state during simulation:

I consistently get the following run-time error:

[WARN] [1539704325.305267, 1074.472000]: PUBLISHING REWARD...
[WARN] [1539704325.305413, 1074.472000]: PUBLISHING REWARD...DONE=0.0,EP=13
Traceback (most recent call last):
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 204, in <module>
    main()
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 125, in main
    action = agent.select_action(state)
  File "/project/ros-kinetic-deep-rl/catkin_ws/src/rrbot_openai_ros_tutorial/src/rrbot_v0_start_training_actor_critic.py", line 64, in select_action
    action = m.sample()
  File "/usr/local/lib/python2.7/dist-packages/torch/distributions/categorical.py", line 110, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid argument 2: invalid multinomial distribution (encountering probability entry < 0) at /pytorch/aten/src/TH/generic/THTensorRandom.cpp:297
[DEBUG] [1539704325.306117, 1074.472000]: END Reseting RobotGazeboEnvironment

edowson · October 19, 2018, 6:20am

@ptrblck Any thoughts on what might be causing this? I tried changing the learning rate, but it still crashes with the above message.

ptrblck · October 19, 2018, 1:21pm

That’s a strange issue, as the probs passed to Categorical are created using F.softmax(action_scores, dim=-1). Could you check if the dimension is set properly for your action_scores? F.softmax should not return negative values.
Also, could you add a print statement of action_states and probs just for the sake of debugging?

alextheoldgreyhorse · June 15, 2019, 11:41pm

I had a similar prob for some reinforcement learning prob. the reason with me was: the softmax had turned into a vector of lovely NaNs. then categorical fails with above error.

ptrblck · June 16, 2019, 12:22am

Were the values passed to softmax already NaNs or did the softmax op created them?

cuiguangwu · January 1, 2020, 4:56pm

So could you solve this prob?

reddymap · January 2, 2020, 10:12am

Did anyone solve this problem?

ptrblck · January 3, 2020, 7:10am

Do you see the same initial error message or do you encouter NaN values?
As the error message states, negative probabilities are not supported.

CC @cuiguangwu

reddymap · January 3, 2020, 11:34am

def getAction2(self, state):
        state = torch.FloatTensor(state) 
        logits, _ = self.model2.forward(state)
        dist = F.softmax(logits, dim = -1)
        self.check2.append(dist)
        probs = Categorical(dist)
        return probs.sample()

i have collected the output of the softmax at the point where the error is raised in the code. It has generated Nan values.
The error:


  File "<ipython-input-12-986f834e9152>", line 1, in <module>
    runfile('C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py', wdir='C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents')

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 406, in <module>
    generateEpisode(x)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 337, in generateEpisode
    action = agent.getAction2(state)

  File "C:/Users/Prudhvinath.DESKTOP-09Q8801/sciebo/Thesis/JSSP/TwoAgents/20JobsTwoAgents.py", line 179, in getAction2
    return probs.sample()

  File "C:\Users\Prudhvinath.DESKTOP-09Q8801\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

checking for the output of the softmax and I have already made an agent instance.

In [13]: agent.check2
Out[13]: [tensor([nan, nan], grad_fn=<SoftmaxBackward>)]

so the softmax is creating the Nans.

reddymap · January 3, 2020, 3:26pm

The nans are not generated in by softmax. The model itself is generating the nan because of the exploding gradients due to the learning rate.
I tried to check my network weights. It has generated nan

In [29]: for param in agent.model1.parameters():
    print(param.data)
    
tensor([[    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [-0.7366, -0.1832, -0.1841,  0.3141,  0.0334, -0.0575, -0.0015,  0.0069,
         -0.1040],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan],
        [    nan,     nan,     nan,     nan,     nan,     nan,     nan,     nan,
             nan]])
tensor([   nan,    nan,    nan,    nan, 0.1520,    nan,    nan,    nan,    nan,
           nan])
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
tensor([nan, nan])
tensor([[ 4.5539e+00, -4.2161e+00,  2.1670e-01, -2.9069e+00, -4.2264e+00,
         -4.5930e+00, -3.9898e+00,  4.3225e+00,  4.2810e+00],
        [ 4.4970e+00, -4.4960e+00,  3.8166e+00, -2.9493e+00, -4.6124e+00,
         -4.6803e+00, -3.9606e+00,  4.0320e+00,  4.2299e+00],
        [-3.0815e-01, -8.5769e-02,  3.1897e-01,  3.0792e-01, -7.2797e-02,
          1.2266e-02,  2.3653e-01,  2.4632e-01,  3.8586e-02],
        [-8.0415e-02,  9.2836e-02,  3.7588e-01,  3.3804e-01,  1.5777e-02,
          7.4958e-02, -6.0354e-02,  8.1592e-02, -3.8448e-01],
        [-1.4028e-01,  1.0577e-01,  3.7370e-01,  2.5323e-01,  1.0640e-01,
         -1.5946e-01,  1.5165e-01, -1.5983e-04,  1.1991e-01],
        [ 3.5704e-02,  8.9192e-02, -4.2397e-02, -1.8162e-01,  2.7302e-02,
         -8.2681e-02,  2.4023e-01,  1.8748e-01, -3.6148e-01],
        [ 4.4627e+00, -4.1383e+00,  1.0344e+00, -2.9011e+00, -4.5565e+00,
         -4.4978e+00, -3.5755e+00,  4.2879e+00,  4.3042e+00],
        [ 4.0489e+00, -3.4591e+00,  1.2957e+00, -2.6597e+00, -3.5383e+00,
         -3.4929e+00, -3.4063e+00,  3.7202e+00,  4.0741e+00],
        [-5.6180e-02,  3.9072e-02, -5.6076e-02,  3.0225e-01, -9.5747e-02,
          1.5115e-01,  1.0766e-02,  2.7571e-01, -3.0291e-01],
        [ 4.2704e+00, -4.1625e+00,  9.1064e-01, -3.1105e+00, -4.2028e+00,
         -4.3451e+00, -3.8066e+00,  4.3735e+00,  4.4310e+00]])
tensor([ 4.3880,  3.9937, -0.0354, -0.0842, -0.1234, -0.2485,  4.0131,  4.0434,
        -0.3865,  4.1969])
tensor([[ 4.3842,  4.5874, -0.0854, -0.1604, -0.2546, -0.0998,  4.6371,  4.2951,
         -0.0997,  4.7081]])
tensor([3.3452])

reddymap · January 6, 2020, 3:18pm

The problem here I think is the exploding or vanishing gradient problem. I am trying to do gradient clipping to get the graident calculated without exploding.

CesMak · March 18, 2020, 8:33pm

Hey did you have success? I have the same problem.

How to clip the gradients with pytorch?

ptrblck · March 19, 2020, 3:21am

You could use torch.nn.utils.clip_grad_norm_ or torch.nn.utils.clip_grad_value_.

CesMak · March 19, 2020, 7:07am

Great thanks

I currently use it like this:

        self.optimizer.zero_grad()
        loss = self.criterion(log_action_probabilities, rewards)
        loss.backward()
        # clipping to prevent nans:
        # see https://discuss.pytorch.org/t/proper-way-to-do-gradient-clipping/191/6
        torch.nn.utils.clip_grad_norm_(self.parameters(), 5)
        self.optimizer.step()
        self.log_action_probabilities.clear()
        self.rewards.clear()

With the clipping I try to get rid of this error:


    action_idx = distribution.sample()      #
  File "/home/markus/Documents/06_Software_Projects/mcts/mcts_env/lib/python3.6/site-packages/torch/distributions/categorical.py", line 107, in sample
    sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

Is this the right case / way to use the clipping?

Does anyone know other methods of handling the above error such as: Using a different network (did not help for me), using other optimizers, other loss functions?

My current problem is that without a lr of lower than 0.1 my algorithm seems to learn nothing. But with this high lr I get the above error. Clipping in the way above does not help:

    def forward(self, state: torch.tensor, legalCards: torch.tensor):
        state = state.resize_(180)
        probs = self.network(torch.FloatTensor(state))
        probs = probs * legalCards
        distribution = Categorical(probs)
        print(probs)
        print(distribution)            #
        action_idx = distribution.sample()      #
        log_action_probability = distribution.log_prob(action_idx)
        self.log_action_probabilities.append(log_action_probability)

Will produce this output after some time:

tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 5.1009e-16, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 5.0108e-22, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
       grad_fn=<MulBackward0>)
Categorical(probs: torch.Size([60]))



tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
       grad_fn=<MulBackward0>)
Categorical(probs: torch.Size([60]))

Maybe the problem is that probs goes to zero…

ptrblck · March 19, 2020, 8:17am

I’m not really familiar with RL, but could you clip probs_2d to valid probabilities, i.e. avoid negative values?

The usage of grad clipping looks alright.

CesMak · March 19, 2020, 11:36am

Hm sry I am quite new what exactly do you mean by clip probs_2d ??? How would I do that?

like this?:


        probs = self.network(torch.FloatTensor(state))
        probs = probs * legalCards
        probs =   torch.nn.utils.clip_grad_norm_(probs , 5)
        distribution = Categorical(probs)

ptrblck · March 19, 2020, 6:56pm

I meant something like torch.clamp(probs_2d, 0, 1), which would propagate the gradient for values inside the interval. But as I said, I’m not sure if that’s a valid approach for your method.