CNN predicts same output after every epoch

SClarkPhysics · August 22, 2019, 12:46pm

I am sorry if this is naive, but I am learning PyTorch and Machine Learning as I go, and I am running into some trouble with my CNN. The model is meant to classify images based on a parameter called m/E, and I have looked at the images and confirmed that they are considerably different. The training and testing sets have both been groomed so that there are exactly 50% of all events in each of the two categories. Training on about 2000 images, testing on about 400.

No matter how many epochs I run, it will always predict the exact same outputs. Further, the model tends to predict all or nearly all images in the same category. Right now I believe it is a training issue so I will just show that part of the code for now.

Here is my CNN:

class CNN(nn.Module):
  def __init__(self, input_size, n_feature, output_size, pp=False):
    super(CNN, self).__init__()
    self.pp = pp
    self.n_feature = n_feature
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=n_features, kernel_size=5)
    self.conv2 = nn.Conv2d(n_feature, n_feature, kernel_size=5)
    self.fc1 = nn.Linear(n_feature*4*4, 50)
    self.fc2 = nn.Linear(50, output_size)

  def forward(self, x, verbose=False):
    x = self.conv1(x)
    x = F.relu(x)
    x = F.max_pool2d(x, kernel_size=2)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x, kernel_size=2)
    x = x.view(-1, self.n_feature*4*4)
    x = self.fc1(x)
    x = F.relu(x)
    x = self.fc2(x)
    x = F.softmax(x, dim=1)
    return x

And here is my Training Loop:

def train(epoch, model, perm=torch.arange(0, isize*isize).long()):
  model.train()
  for batch_idx, (data, target) in enumerate(train_loader):
     data, target = data.to(device), target.to(device)

    data = data.float()
    target = target.float()

    data = data.view(-1, isize*isize)
    data = data[:, perm]
    data = data.view(-1, 1, isize, isize)

    optimizer.zero_grad()
    output = model(data)
    loss = F.binary_cross_entropy(output, target, reduction='sum')
    loss.backward()
    optimizer.step()
    if batch_idx % 16 == 0:
      print('Train Epoch: {} [{}/{} ({:.0f}%)]\tloss: {:.6f}'.format(epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

And finally, how everything gets implemented:

model_cnn_train = CNN(input_size, n_features, output_size, pp=False)
model_cnn_train.to(device)
model_cnn_test = CNN(input_size, n_features, output_size, pp=False)
model_cnn_test.to(device)

optimizer = optim.SGD(model_cnn_train.parameters(), lr=0.01, momentum=0.)

print("Convolutional Neural Network")
print('Number of parameters: {}'.format(get_n_params(model_cnn_train)))

for epoch in range(0, nEpochs):

  truearray = []
  targarray = []
 
  train(epoch, model_cnn_train)
  print("\n\nTESTING TIME\n\n")

  test(model_cnn_test)
  tcount = 0
  for pp in targarray:
    if pp > 0.5:
      tcount += 1
  print("I guessed that there would be {} objects with m/E > {}\n".format(tcount,MoEthreshold))

A sample of some output for 3 training epochs may look like this:

Classifying based on MoE

Convolutional Neural Network
Number of parameters: 19784
Train Epoch: 0 [0/2012 (0%)]	loss: 41.824711
Train Epoch: 0 [512/2012 (25%)]	loss: 884.192932
Train Epoch: 0 [1024/2012 (51%)]	loss: 1049.979126
Train Epoch: 0 [1536/2012 (76%)]	loss: 828.930847

TESTING TIME

Test set: Average loss: 1.4427, Accuracy: 196/413 (47%)
I guessed that there would be 13 objects with m/E > 0.005

Train Epoch: 1 [0/2012 (0%)]	loss: 884.192932
Train Epoch: 1 [512/2012 (25%)]	loss: 773.668762
Train Epoch: 1 [1024/2012 (51%)]	loss: 663.144592
Train Epoch: 1 [1536/2012 (76%)]	loss: 828.930847

TESTING TIME

Test set: Average loss: 1.4427, Accuracy: 196/413 (47%)
I guessed that there would be 13 objects with m/E > 0.005

Train Epoch: 2 [0/2012 (0%)]	loss: 718.406677
Train Epoch: 2 [512/2012 (25%)]	loss: 552.620422
Train Epoch: 2 [1024/2012 (51%)]	loss: 663.144592
Train Epoch: 2 [1536/2012 (76%)]	loss: 773.668762

TESTING TIME

Test set: Average loss: 1.4427, Accuracy: 196/413 (47%)
I guessed that there would be 13 objects with m/E > 0.005

I am not sure why it is performing so poorly, or why the predictions don’t change after each epoch. I really appreciate any help, so thank you in advance and let me know if you need to see more of the code or know more about my environment