Categorical distribution returning breaking

Hey, I am not too sure what is going wrong with my code, however, I am using a categorical distribution and getting a fairly weird error and am uncertain as to why. I did look around for a while online but nothing I found seemed to explain what exactly this issue was and how to get around it.

The error I am getting is as follows:

> ValueError: Expected parameter logits (Tensor of shape (1024, 6)) of distribution Categorical(logits: torch.Size([1024, 6])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
> tensor([[nan, nan, nan, nan, nan, nan],
>         [nan, nan, nan, nan, nan, nan],
>         [nan, nan, nan, nan, nan, nan],
>         ...,
>         [nan, nan, nan, nan, nan, nan],
>         [nan, nan, nan, nan, nan, nan],
>         [nan, nan, nan, nan, nan, nan]], device='cuda:0',
>        grad_fn=<SubBackward0>)

The code I am using to generate this is just a simple feed forward network.

class actor(nn.Module):
    def __init__(self, input_size, n_actions):
        super(actor, self).__init__()

        self.base = layer_init(nn.Linear(input_size, 512))
        self.actor = layer_init(nn.Linear(512, n_actions), std=0.01)

    def forward(self, x):
        x = x.clone()
        x = self.base(x)
        x = torch.tanh(x)
        x = self.actor(x)
        return x

The output of this just gets put into a categorical dist: probs = Categorical(logits=logits)
This seems to be where the error is occurring. The code does not break on the first run, it takes a couple hundred thousand steps before it breaks.

If anyone knows what the problem is and how to fix it, would appreciate it immensely.

Based on the error message it seems the actor is creating NaN outputs after a few iterations of training. Are you seeing an increase in the value range of its output during training, which could then overflow after a while?

Hi, yeah when I was attempting to debug it yesterday I noticed that the augmentation I had in my loss function started to give extremely large values. I did change this to clip the loss function if it goes out of a certain range and it seems like the error was fixed. So, I think you are correct and overflow is causing the issue.

Thank you for the assistance.

Hi, I have encountered a similar issue when training a PPO agent for discrete actions. Can you give a hint on how to set the clipped loss range as you said here? Thank you!

Hey yeah not an issue.

The values were going way out and into the range of a couple hundreds which slowly exploded to thousands and so forth… Basically, as the probability of an action got really small it caused the logs to get out of control. So, I just clipped on 2 and 20 but in particular on the absolute value, however in this case given that the value should all be negative if it is a log of a probability that you are dealing with then you can probably just clip on the neg. However, clip on both just for robustness sake I suppose.

Thank you very much for the explanation!