Possible local minima problem with my model

Hi, I am currently training a CNN to binarily classify images of polar bears. After only a few epochs it classifies about 80% of my positive examples correctly, even in my validation set. However, if I show the model a blank white image after training, it outputs with high certainty that the image belongs in the polar bear class, which leads me to the conclusion, that it possibly predicts the output based on the amount of white pixels in the input image. I thought about adding a lot of white images as negative examples to my data, but im not sure if this would be the right approach.
My model looks like this:
class Net(torch.nn.Module):
def init(self):
def forward(self,x):
x = self.conv1(x)
x = self.dropout1(x)
x = CELU(x)
x = self.conv2(x)
x = CELU(x)
x = F.max_pool2d(x, 2)
x = torch.flatten(x, 1)
x = CELU(x)
x = self.fc1(x)
x = CELU(x)
x = self.dropout2(x)
x = self.fc2(x)
output = torch.sigmoid(x)
return output


Your idea sounds valid. If you suspect the model is focusing on the “wrong” feature, you could add negative samples as e.g. just pictures of snow without any bears.
Also, you could use e.g. Captum to apply some visualization techniques, which might be helpful to figure out what the strongest features in the inputs are.

I’ve experimented around with captum a bit and generally speaking, my verdict was confirmed, as this heatmap of a completely blank image shows.

However, I am quite confused, because in some input Images, it seems to recognise the polar bear in the image, but just seemingly neglects it, as this heatmap shows:

This seems quite contraintuitive to me, or am I misinterpreting something here?

Could you explain what you’ve visualized in these images?
Are more positive values corresponding to “more important” areas of the input image?
Is the first (snow only) image also classified as a bear? If so, could you post another figure showing a negative sample (no bear and classified correctly)?