Pretrained VGG16 giving Nan output

Hi,

I am working with a pretrained VGG16 model to classify images. I have added my own layers after the model like this:

model = torchvision.models.vgg16(pretrained=True)
model.classifier.add_module('7', nn.Linear(in_features=1000, out_features=500,bias=True))
model.classifier.add_module('8', nn.ReLU())
model.classifier.add_module('9', nn.Linear(in_features=500, out_features=100,bias=True))
model.classifier.add_module('10', nn.ReLU())
model.classifier.add_module('11', nn.Linear(in_features=100, out_features=67,bias=True))

When I start the training, I print the output of the model and I see it’s nan. I don’t know what going on? I’ve ensured my input doesn’t contain nan and this is my dataloader:

class dataload(Dataset):

    def __init__(self, x, transform=None):
        self.data = x

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):

        img = mpimg.imread(self.data[i])/255.0
        img = img.transpose((2, 0, 1))
        img = torch.from_numpy(img).float()
        tmp = np.int32(filenames[i].split('/')[-1].split('_')[0][1])
        label = np.zeros(67)
        label[tmp] = 1
        label = torch.from_numpy(label).float()

        # if self.transform:
        #     sample = self.transform(sample)

        return img,label

What could be going on?

I don’t know what might be causing the issue, as I’m not able to reproduce the NaN outputs using your code and random inputs:

model = models.vgg16(pretrained=True)
model.classifier.add_module('7', nn.Linear(in_features=1000, out_features=500,bias=True))
model.classifier.add_module('8', nn.ReLU())
model.classifier.add_module('9', nn.Linear(in_features=500, out_features=100,bias=True))
model.classifier.add_module('10', nn.ReLU())
model.classifier.add_module('11', nn.Linear(in_features=100, out_features=67,bias=True))

for _ in range(100):
    x = torch.randn(1, 3, 224, 224)
    out = model(x)
    print(torch.isnan(out).any())

Even I don’t know. When I was trying, I was getting Nan. But then, I broke it down. I checked if just the pretrained model was giving nan. Sometimes it did and sometimes it didn’t. Mostly it didn’t.

Then after adding each fc layer I checked the output and again the same. Sometimes it would give nan but mostly, it would give some result.

I don’t know if there’s some dedicated debugging for this. What do you suggest? Maybe pytorch needs a nan detector.