RuntimeError: invalid multinomial distribution

Hi I am receiving this error while sampling the reason for the error is the occurrence of ’ nan ’ values but I am not sure why are those values there I am using softmax activation and I have also used torch.nn.utils.clip_grad but still getting same error.

Unhandled exception in thread started by <function service at 0x7fc0d202b140>
Traceback (most recent call last):
File “ppo_takeoff_8_actions.py”, line 352, in service
action = drone.get_action(dist)
File “ppo_takeoff_8_actions.py”, line 122, in get_action
ac = dist.sample()
File “/home/cuda/.local/lib/python2.7/site-packages/torch/distributions/categorical.py”, line 107, in sample
sample_2d = torch.multinomial(probs_2d, 1, True)
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

Neural network definantion

Blockquote
self.actor = nn.Sequential(
nn.Linear(in_features= 16 , out_features= 200),
nn.ReLU(),
nn.Linear(in_features= 200 , out_features= 500),
nn.ReLU(),
nn.Linear(in_features= 500 , out_features= 500),
nn.ReLU(),
nn.Linear(in_features= 500 , out_features= 100),
nn.ReLU(),
nn.Linear(in_features= 100 , out_features= 8),
nn.Softmax(dim=-1)
)

back-propagation step

Blockquote
self.actor.optimizer.zero_grad()
self.critic.optimizer.zero_grad()
total_loss.backward()
T.nn.utils.clip_grad_norm_(self.actor.parameters() ,5)
T.nn.utils.clip_grad_norm_(self.critic.parameters() , 5)

Thank you

NaN values can be created from various sources:

  • your input to the model might contain invalid values (NaNs and/or Infs) and might thus “poison” your model
  • your training might diverge, the loss might explode and either directly create invalid loss values or invalid gradients during the backward pass
  • your gradients might explode due to an “unstable” operation in case you are using custom operations.

You could start by checking the loss and e.g. lowering the learning rate in case you are seeing that the model diverges.

1 Like

Ok thanks I will check all these things

Hi it was input data error there was one component that was going to infinity . Thank you