Beginner question: model returns NAN

Hi, my model returns a NAN, i’m using the torchvision datasets api to get the MNIST dataset.
Then I expend the 28x28 images in an extra dimension with

torch.einsum('i,jkl->jikl', vector, img)

where vector is array of ones with zeros on the front and back
the resulting shape is (B, C, D, H, W) = (1, 1, 28, 28, 28) and looks like:

The model i’m using is defined as:

class ConvAutoEncoder4(torch.nn.Module):
    def __init__(self):
        super(ConvAutoEncoder4, self).__init__()
        channels = 4
        self.fcn3 = nn.Conv3d(in_channels=1, out_channels=channels, kernel_size=4, stride=4)
        self.fcn4 = nn.Conv3d(in_channels=channels, out_channels=channels, kernel_size=1)
        self.fcn5 = nn.Conv3d(in_channels=channels, out_channels=channels, kernel_size=1)
        self.fcn6 = nn.ConvTranspose3d(in_channels=channels, out_channels=1, kernel_size=4, stride=4)
        
    def forward(self, tensor):
        
#         for slc in tensor[0][0]:
#             print(slc)
        
#         print(tensor.shape)
        tensor = self.fcn3(tensor)
        tensor = torch.relu(tensor)
        print(tensor.shape)
        for slc in tensor[0][0]:
            print(slc)
        tensor = self.fcn4(tensor) 
###        NOW IT CONTAINS NANS???
        print(tensor.shape)
        for slc in tensor[0][0]:
            print(slc)
        tensor = torch.relu(tensor)
        tensor = self.fcn5(tensor)
        tensor = torch.relu(tensor)
#         print(tensor.shape)
        tensor = self.fcn6(tensor)
        tensor = torch.relu(tensor)
#         print(tensor.shape)
#         for slc in tensor[0][0]:
#             print(slc)
        return tensor

Now the NAN’s are arising in the second image alread usually around the second layer.
I don’t understand why, the 1x1x1 kernel should be ok right?
I tried this on the original 2D dataset before and that worked fine.

When i test this with

it = iter(train_loader)
imgs = [next(it)[0] for i in range(100)]
print(len(imgs))

print(imgs[0].shape)

mod = ConvAutoEncoder4()
crtrn = nn.MSELoss()
ptmzr = optimizer = optim.Adam(mod.parameters(), lr=1e-3)

torch.autograd.set_detect_anomaly(False)

ptmzr.zero_grad()
res = mod(imgs[0])
lss = crtrn(res, imgs[0])
lss.backward()
ptmzr.step()
print(lss.item())

ptmzr.zero_grad()
res = mod(imgs[1])
lss = crtrn(res, imgs[1])
lss.backward()
ptmzr.step()
print(lss.item())

The ouput is

100
torch.Size([1, 1, 28, 28, 28])
torch.Size([1, 4, 7, 7, 7])
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0209, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
       grad_fn=<UnbindBackward>)
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0074, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]],
       grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
torch.Size([1, 4, 7, 7, 7])
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.5038, -0.4605, -0.4621, -0.4506],
        [-0.4506, -0.4506, -0.4553, -0.4688, -0.4506, -0.4613, -0.4506],
        [-0.4506, -0.4518, -0.5405, -0.4506, -0.4505, -0.4559, -0.4506],
        [-0.4506, -0.4724, -0.4928, -0.4506, -0.4550, -0.4908, -0.4506],
        [-0.4506, -0.4837, -0.4551, -0.4715, -0.4912, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4650, -0.5366, -0.4510, -0.4506],
        [-0.4506, -0.4506, -0.4545, -0.5428, -0.4506, -0.4499, -0.4506],
        [-0.4506, -0.4520, -0.5970, -0.4506, -0.4505, -0.4499, -0.4506],
        [-0.4506, -0.5225, -0.5748, -0.4506, -0.4538, -0.5510, -0.4506],
        [-0.4506, -0.5077, -0.4728, -0.4642, -0.5557, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4657, -0.5845, -0.4610, -0.4506],
        [-0.4506, -0.4506, -0.4750, -0.5655, -0.4506, -0.4499, -0.4506],
        [-0.4506, -0.4508, -0.5578, -0.4506, -0.4506, -0.4499, -0.4506],
        [-0.4506, -0.4942, -0.6176, -0.4506, -0.4762, -0.5629, -0.4506],
        [-0.4506, -0.4777, -0.5257, -0.5045, -0.5382, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
tensor([[-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506],
        [-0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506, -0.4506]],
       grad_fn=<UnbindBackward>)
0.03493741527199745
torch.Size([1, 4, 7, 7, 7])
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0405, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0020, 0.0000, 0.0000, 0.0000]],
       grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
tensor([[0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0.]], grad_fn=<UnbindBackward>)
torch.Size([1, 4, 7, 7, 7])
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
tensor([[nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan, nan]], grad_fn=<UnbindBackward>)
nan

The same happens when I use a dataloader

So from a different question I found
torch.autograd.set_detect_anomaly(True)
the resulting stacktrace is
RuntimeError: Function 'SlowConv3DBackward' returned nan values in its 1th output.

using

for name, param in model.named_parameters():
    print(name, torch.isfinite(param.grad).all())

gives stacktrace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-105-bd15723eb4e7> in <module>
     12 
     13 ptmzr.zero_grad()
---> 14 res = mod(imgs[0])
     15 lss = crtrn(res, imgs[0])
     16 lss.backward()

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-103-cb1d63b32440> in forward(self, tensor)
     20             print(slc)
     21         for name, param in self.named_parameters():
---> 22             print(name, torch.isfinite(param.grad).all())
     23         tensor = self.fcn4(tensor)
     24         for name, param in self.named_parameters():

TypeError: isfinite(): argument 'input' (position 1) must be Tensor, not NoneType

So the output is none of the previous layer?

The last error is raised, since the .grad attribute is still set to None.
This might be the case, if you are trying to check the gradients before they were calculated in the first backward() operation.

Thanks for the reply!
I deleted parts of the network and only inputted 1 image and still it gave me the same error. I created a second question with the reduced code.
The error was appearing stachostically, even with seeding. I’m faily certain the code does not use the gradients before calling backward() the first time. (The results are deterministic if it does not crash)

You might have an issue in the data being fed in. For instance, if you took the images and preprocess them in a spreadsheet and exported to csv, then it’s possible one of the images got saved with quotes around it’s values. How are you loading the data?