nn.Conv2d bug with kernel_size = (3,3)

When inputs are "nan"s, I expect outputs to be "nan"s as well. But I found that a nn.Conv2d layer, specifically with kernel_size = (3,3) and in GPU mode, outputs “-inf” instead of “nan”. Here are simple codes to verify.

import torch 
import torch.nn as nn  

x = torch.acos(1+torch.ones(1,1,3,3)) # Make a 'nan' tensor 
model = nn.Conv2d(1,1,kernel_size=(3,3))
y = model(x) 
print('input_cpu:',x) 
print('output_cpu:',y)  

model_gpu = model.cuda() 
x_gpu = x.cuda() 
y_gpu = model_gpu(x_gpu) 
print('input_gpu:',x_gpu) 
print('output_gpu:',y_gpu)

output:

input_cpu: tensor([[[[nan, nan, nan],
          [nan, nan, nan],
          [nan, nan, nan]]]])
output_cpu: tensor([[[[nan]]]], grad_fn=<ThnnConv2DBackward>)
input_gpu: tensor([[[[nan, nan, nan],
          [nan, nan, nan],
          [nan, nan, nan]]]], device='cuda:0')
output_gpu: tensor([[[[-inf]]]], device='cuda:0', grad_fn=<CudnnConvolutionBackward>)

Is this my problem only or you guys have the same issue?

Hi,

I don’t think there is any guaranty of what a conv will return if the input is nan.
This most likely will be GPU dependant: it returns nan on my titan black. Which gpu do you use?

I am on a GeForce RTX 2070. So you returned a “nan”? It’s funny that only with kernal_size = (3,3) I would get a “-inf”, all other kernels I get “nan” as well.

The thing is that cudnn use different algorithms for different inputs/kernel sizes.
So I guess the algorithm used for small kernels behaves differently with nans.

1 Like

This is a cudnn bug, @ezyang filed a bug with nvidia for a similar case.

2 Likes