I made a very simple 3-layered fully-connected network for binary classification (using NASA C-MAPSS dataset to classify healthy and faulty turbofan engines). the input is vector of length 26 and the output is a sigmoid activation. The task is pretty easy (a basic logistic regression model gives me 100% test accuracy), I’m porting this code from Keras (where everything worked as expected) but when executing PyTorch code the loss doesn’t change. Executing the exact same code different times, most of the times I get that non-changing loss, sometimes it works and converges within a couple of epochs to 100% test accuracy as expected. I tried using Keras settings (learning rate, Adam parameters, weights initialization) yet the problem persisted. After setting the random seed (without the default weights initialization this time) I get the same result with each run (obviously!), even that the loss doesn’t change with each run but it changes from run to run. I had to try different seeds until I found a seed that actually works and the model works as expected and runs correctly every time (I set the seed to 527, other values may have worked but that’s the only one that I found).
What can be the cause of this behavior?
Here is my code and training process (data processing code is a bit long and I’m sure it’s not the problem):
class CMAPSSBinaryClassifier(nn.Module):
def __init__(self):
super(CMAPSSBinaryClassifier, self).__init__()
self.fc1 = nn.Linear(26, 16)
self.fc2 = nn.Linear(16, 4)
self.fc3 = nn.Linear(4, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
output = torch.sigmoid(self.fc3(x))
return output
data_path = "/home/abdeljalil/Workspace/Datasets/CMAPSS/"
data_FD01 = CMAPSSDataset(data_path, fd_number=1)
model_FD01 = CMAPSSBinaryClassifier()
#tried Xavier weights initialization scheme
#model_FD01.apply(init_weights)
loader_train, loader_test = data_FD01.construct_binary_classification_data(good_faulty_threshould=30, batch_size=64)
epochs = 100
#tried Adam with a wide range of learning rates
optimizer = optim.Adam(model_FD01.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-7)
#tried SGD with different learning rates too
#optimizer = optim.SGD(model_FD01.parameters(), lr=0.001, momentum=0.9)
model_FD01.train()
criterion = nn.BCELoss()
for epoch in range(epochs+1):
correct = 0
for batch_id, (data, target) in enumerate(loader_train):
optimizer.zero_grad()
output = model_FD01(data)
output = output.view_as(target)
loss = criterion(output, target)
loss.backward()
optimizer.step()
output = torch.round(output)
correct += output.eq(target).sum().item()
print('Train epoch: {}\t Accuracy ({:.0f}%)\tLoss: {:.6f}'.format(epoch, 100. * correct/len(loader_train.dataset), loss.item()))
Here is a sample output (first 6 epochs) when the model doesn’t converge (like 90% of the times):
Train epoch: 0 Accuracy (50%) Loss: 13.815515
Train epoch: 1 Accuracy (50%) Loss: 13.815515
Train epoch: 2 Accuracy (50%) Loss: 13.815515
Train epoch: 3 Accuracy (50%) Loss: 13.815515
Train epoch: 4 Accuracy (50%) Loss: 13.815515
Train epoch: 5 Accuracy (50%) Loss: 13.815515
Sometimes the loss changes but stuck at around 0.69:
Train epoch: 0 Accuracy (50%) Loss: 0.704716
Train epoch: 1 Accuracy (50%) Loss: 0.701211
Train epoch: 2 Accuracy (50%) Loss: 0.698781
Train epoch: 3 Accuracy (50%) Loss: 0.697099
Train epoch: 4 Accuracy (50%) Loss: 0.695932
Train epoch: 5 Accuracy (50%) Loss: 0.695122
And the rare few times it actually works:
Train epoch: 0 Accuracy (56%) Loss: 0.516986
Train epoch: 1 Accuracy (90%) Loss: 0.318052
Train epoch: 2 Accuracy (100%) Loss: 0.203251
Train epoch: 3 Accuracy (100%) Loss: 0.136395
Train epoch: 4 Accuracy (100%) Loss: 0.096920
Train epoch: 5 Accuracy (100%) Loss: 0.072752