Torch randn operation gives NaN values in training loop

dome272 · November 12, 2021, 8:25am

Im experiencing something really weird. The torch.randn operation recurrently gives me NaN values in some positions in the tensor.

It gets weirder: the while loop sometimes never gets exited. It seems not to be “random” at this point if NaN values get produced if it once gets caught in it.
And it gets even weirder: When trying the same code, for producing the random latent code, outside in a separate file, this never happens. It never creates any NaN values when I run it in this separate file.
I have no idea what to do about this. Did anyone else experience this?
(I also have the whole code on github if that helps: https://github.com/dome272/ProjectedGAN-pytorch/blob/main/projected_gan.py#L142

ptrblck · November 12, 2021, 8:42am

torch.Tensor is not calling randn but is returning an uninitialized tensor, which can contain any values (including invalid values) (same as if you would be using torch.empty).
Since this behavior might not be clear, the usage of torch.Tensor (uppercase T) is deprecated and you should thus use the factory methods via torch.randn, torch.tensor (to initialize the tensor with pre-defined values), torch.empty (in case you want to use uninitialized memory and fill it later) etc.