Torch.randn() gives NaNs

dzimm · September 26, 2017, 7:31am

Hi :),

I’m not sure not anyone else has already noticed (or am I the only one?), but I sometimes get NaNs in torch.randn().
This then leads to nans in the batch norm etc.

Am I doing sth wrong ? Right now I simply fixed it with a while loop that checks if there are any nans in the tensor (but I think there should be a better solution).

albanD · September 26, 2017, 9:00am

Hi,

I used randn quite a lot and never saw that, also all our test are based on randn and would fail if it returns any NaN.
You may want to make sure that you do not create a tensor without initializing it.
If you still have the issue, could you provide some code to reproduce it please so that we could look into it in more details.

dzimm · September 26, 2017, 1:56pm

Basically my code right now looks like this:

noise = torch.randn(x.size())
while np.any(np.isnan(noise.numpy())):
    noise = torch.randn(x.size())

and it is quite reproducible (at least for me, that’s why I was wondering why nobody else mentioned it).

albanD · September 26, 2017, 4:01pm

Hi,

What is the size of x in your above example, using 1000x1000 works well for me.
Also did you forgot a not in your code sample above for the while loop condition?
Even after letting it run for a while, I still don’t get any problem.

smth · September 26, 2017, 4:32pm

@dzimm are you on Linux, OSX or Windows? if you can reproduce this in a docker linux image I am very interested.

dzimm · September 27, 2017, 7:14am

I’m on ubuntu 16.04 with a Titan XP and the official nvidia cuda drivers and cudnn 6 (on the pytorch 0.2.0 release).

The code is the actual code I use right now (for every batch), thus I basically try not to get any NaNs (that’s why there is no not ), but after roughly 10.000 runs/batches I have been at least once in the while part. My current x.size() is (64,1,32,32)

If I have some time, I’ll try to reproduce it in the docker image .