[Solved]Get 'nan' for input after applying a convolutional layer

Yilin_Liu · October 1, 2017, 11:11pm

Then the data has to be emailed…QAQ

Yilin_Liu · October 1, 2017, 11:17pm

Could you please give me your email address? so that I can send you the data. Otherwise the error can not be reproduced because I’ve tried many different random 5D inputs and they all work fine. The error seems to happen so randomly that I can’t differentiate whether it comes from my data or bug of the functions.

Yilin_Liu · October 1, 2017, 11:20pm

Oh I see what you mean now. Yes it’s indeed because the input of the unpooling layer has nan, and that’s because the output of the previous layers are ‘nan’. That’s why I asked this question that why I got nan for inputs after a convolution. Thanks!

Yilin_Liu · October 1, 2017, 11:24pm

And I’ve also tried giving batch size of 1 either with or without batch normalization. Both work fine. QAQ

Yilin_Liu · October 1, 2017, 11:39pm

And I don’t think the error comes from the data because if I take out the data that causes the error and apply a conv layer to it, it doesn’t give me any nan value. QAQ

smth · October 1, 2017, 11:56pm

okay now check why make_training_samples is giving nan. you can step line by line and see things if you use pdb: https://github.com/spiside/pdb-tutorial#pdb-101-intro-to-pdb

Yilin_Liu · October 2, 2017, 12:55am

Thanks so much for the help!!

the thing is, make_training_samples is not giving nan,
I’ve checked this by doing:

pd.isnull(imgdata)
False

it’s the convolutional layer that gives nan…

Would it be possible that:

weights of the conv layer are not properly initialized? (you can see that I didn’t apply any particular weights initialization method in the network)
I should really give a batch instead of just one input at a time?

Thanks!!

Yilin_Liu · October 2, 2017, 1:08am

Btw I haven’t figured out how to give a batch of inputs to the network, since my data is in nifti format and DataLoader do not support nifti, and I have a hard time converting a 5d array with dytpe(‘O’) to a 5D tensor, which can not be done directly with torch.from_numpy.

So I really need help with how to concatenate input tensors so that they become a batch??
I know torch.cat but it doesn’t work in my case as I have to loop over items in the batch to concatenate them.

Thanks!! My graduation depends on this project:sob:

Yilin_Liu · October 2, 2017, 2:40am

Solved it! It was because of the PReLU (though I’m not sure why). I changed it to ReLU and it works fine now

Still need help with the batch problem…

smth · October 2, 2017, 3:43am

next time please dont write messages like this. it is not helpful and does not add to the discussion.

Yilin_Liu · October 2, 2017, 4:03am

Ok. sorry about that.

fmassa · October 4, 2017, 10:03pm

About batch norm, if all elements of your input are the same, the variance is zero, which might lead to the nans you were seeing before.
That seemed to be the case in your example.

Yilin_Liu · October 4, 2017, 10:36pm

Thank you so much for the help!
By all elements of the input, do you mean patches within a batch that contain the same elements or a patch itself has a single kind of element? Sorry for the dumb question. I’m a newbie.

Btw this problem arises even when I didn’t use the batchNorm layer at all, which is why I’m so confused. I’ve also tried giving a batch of all zeros (with BN) and it still works. It didn’t give me nan only when I changed the PReLU to ReLU…

fmassa · October 4, 2017, 11:34pm

if all elements in a tensor are the same (for a specific channel), then the variance will be zero, and this might lead to instabilities and nan/inf.

If you can isolate in ~20 lines of code a full example illustrating the nans, I can have a look at it.