Why my model returns nan?

DXZ_999 · September 1, 2018, 4:25pm

The model is here:

class Actor(nn.Module):
    def __init__(self, state_size, action_size, hidden_size=512):
        super(Actor, self).__init__()
        
        self.state_size = state_size
        self.hidden_size = hidden_size
        self.action_size = action_size
        
        self.block_state = nn.Sequential(
            nn.Linear(state_size, hidden_size),
            nn.LayerNorm(hidden_size),
            nn.ReLU(),
        )
        
        self.block_hidden = nn.Sequential(
            nn.Linear(hidden_size, hidden_size),
            nn.LayerNorm(hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.LayerNorm(hidden_size),
            nn.ReLU(),
        )
        
        self.block_mean = nn.Sequential(
            nn.Linear(hidden_size, action_size),
        )
        
        self.block_std = nn.Sequential(
            nn.Linear(hidden_size, action_size),
        )
        
    def forward(self, state):
        out = self.block_state(state)
        out = self.block_hidden(out)
        mean = self.block_mean(out)
        std = self.block_std(out)
        return mean,std

The output is:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan]], grad_fn=)
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan]], grad_fn=)
I’m sure the input doesn’t contain any nan value.

rasbt · September 1, 2018, 4:27pm

There are many potential reasons. Most likely exploding gradients. The two things to try first:

Normalize the inputs
Lower the learning rate

DXZ_999 · September 1, 2018, 4:35pm

I will try to normalize the inputs. See whether it works.
It can’t be learning rate problem, since it happened at the beginning.

ptrblck · September 1, 2018, 4:56pm

Could you additionally check your input for inf of nan values?

DXZ_999 · September 1, 2018, 5:19pm

The input doesn’t contain any nan value. I guess something goes wrong in the block_hidden since both the block_mean and block_std contains nan value.

wangwwno1 · October 18, 2019, 9:03am

@DXZ_999 @rasbt
Hello, there is another possibility: If the output contain some large values (abs(value) > 1e20), then nn.LayerNorm(output) might return a all nan vector.

Similiar problem happens in my attention model, I’m pretty sure that it can’t be exploding gradients in my model because:

The model can converge (after some iteration, the model loss will be low and stable)
Debug result shows that only a limited number of samples has this problem.
Frequency is so rare that I have to use torch.any(torch.isnan(x)) to catch this bug, and even with this, it require multiple runs to catch one examples.
4.Only intermediate result become nan, input normalization is implemented but problem still exist.

My model handle time-series sequence, if there are one vector ‘infected’ with nan, it will propagate and ruin the whole output, so I would like to know whether it is a bug or any solution to address it.

Flock1 · August 6, 2021, 3:28pm

This might sound weird but restart your machine. I was facing some issue with the GPU and had to restart the system and to my surprise, it started training.

Ahsan_Habib · October 14, 2021, 6:39am

I was facing similar issue, and passing each sample through the function torch.nan_to_num() did the trick. I noticed very few samples were having nan.

GoingMyWay · November 16, 2021, 2:52am

Please check the weights. The weights could be nan!

vincentmichael089 · April 22, 2022, 3:52am

I recommend doing torch.max(your_tensor) and torch.min(your_tensor) to check if any of your tensor is producing “inf”

snknitin · November 2, 2022, 6:55am

One of your features is probably a very high range value that even after standardized can have underflow or overflow issues , so during your batches it might see a very low value for that feature and adjust a really high weight to that feature and then suddenly some datapoint has a high value and it explodes. If there is one nan in your predictions, your loss turns to nan. it won’t train anymore or update. You can circumvent that in a loss function but that weight will remain high. Delete those unnecessary features that have a really high range for distribution. Scaling or normalizing them might not help

Flock1 · December 3, 2022, 2:03am

Definitely check the weights. I once checked the weights and some of them were 0.0. I added 0.01 to all of them and then it started training.

hitish_singla · October 26, 2023, 4:12pm

check if you NAN values your dataset
dont forget to normalize your data

Ko_R · January 16, 2024, 5:46am

if you use
torch.sqrt(x)
or any zero-against function
make sure there is no 0 value,
so add a small number is a way to enhance numerical stability
torch.sqrt(x + 1e-8)