Loss is not changing

fkucuk · March 19, 2019, 8:45am

I have implemented a simple MLP to train on a model. I’m using the “ignite” wrapper to simplify the process. However, the loss is not decreasing nor increasing. The code I’m using for the training is as follows:

class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.layers = nn.Sequential(nn.Linear(2048, 1024),
                                    nn.Tanh(),
                                    nn.Linear(1024, 978))
        
    def forward(self, x):
        x = self.layers(x)
        return x
    
model = MLP().double()
optimizer = torch.optim.Adam(model.parameters())
loss = torch.nn.MSELoss()

device = 'cpu'

if torch.cuda.is_available():
    device = 'cuda'

trainer = create_supervised_trainer(model = model, optimizer=optimizer, loss_fn=loss, device=device)
evaluator = create_supervised_evaluator(model, metrics={"MSE": MeanSquaredError(), "MAE" : MeanAbsoluteError()}, device=device)

tensor_x_train = torch.from_numpy(X_train)
tensor_y_train = torch.from_numpy(y_train)

train_dataset = utils.TensorDataset(tensor_x_train, tensor_y_train) 
train_loader = utils.DataLoader(train_dataset)

tensor_x_test = torch.from_numpy(X_test)
tensor_y_test = torch.from_numpy(y_test)

test_dataset = utils.TensorDataset(tensor_x_test, tensor_y_test) 
val_loader = utils.DataLoader(test_dataset)

@trainer.on(Events.ITERATION_COMPLETED)
def log_training_loss(engine):
    print("Epoch[{}] Loss: {:.2f}".format(engine.state.epoch, len(train_loader), engine.state.output))
    
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(trainer):
    evaluator.run(train_loader)
    metrics = evaluator.state.metrics
    print("Training Results - Epoch: {}  Avg MSE: {:.2f} Avg MAE: {:.2f}"
          .format(trainer.state.epoch, metrics['MSE'], metrics['MAE']))

@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(engine):
    evaluator.run(val_loader)
    metrics = evaluator.state.metrics
    print("Validation Results - Epoch: {}  Avg MSE: {:.2f} Avg MAE: {:.2f}"
          .format(engine.state.epoch, metrics['MSE'], metrics['MAE']))

trainer.run(train_loader, max_epochs=100)

I did checked the grads of the parameters and they are non-zero values. However, the loss is not changing. Where did I gone wrong?

Thanks in advance.

LeviViana · March 19, 2019, 8:53am

Is your training data normalized?

fkucuk · March 19, 2019, 8:57am

I think it doesn’t need to be. Values of the features are on the same scale and are binary. The ground truth values are ranging between -0.5 and 0.5 and are in the same scale. Should I still normalize it?

Edit: Diddn’t normalize the input features since they are binary. Did normalize the ground truth values between 0 and 1. Still there is no change.

LeviViana · March 19, 2019, 9:08am

The ground-truth values don’t need to be normalized. It’s hard to tell if it is a normalization problem without seeing the data. You should maybe give it a try, since it is a simple operation.

There are a lot of stuff that might be happening. Here are some general tips:

Are you sure your labels are correct?
Did you try other learning rates? smaller? bigger?
Did you try other architectures? You might be suffering from under-fitting.
Did you try other losses?

fkucuk · March 19, 2019, 9:13am

I’m not sure, how to check it properly? It seems correct to me.
Yeah done that, also tried with different optimizers.
Yeah done that, nothing changed.
Tried L1 loss too. Nothing changed.