Training validation errors converge

I define this simple network:

class DenseModel(nn.Module):
    def __init__(self, inputSize, hiddenSize, outputSize, numLayers, p):
        self.i2h = nn.Linear(inputSize, hiddenSize)
        self.relu = nn.ReLU()
        self.h2h = nn.Linear(hiddenSize, hiddenSize)
        self.h2o = nn.Linear(hiddenSize, outputSize)
        self.softmax = nn.Softmax(dim=1)    # Normalize output to probs.
        self.dropout = nn.Dropout(p)

    def forward(self, x):
        out = self.i2h(x)
        out = self.relu(out)
        out = self.dropout(out)
        layer = 1
        while layer <= numLayers:
            out = self.h2h(out)
            out = self.relu(out)
            out = self.dropout(out)
        out = self.h2o(out)
        out = self.softmax(out)
        return out

Output is two-classes, loss is nn.CrossEntropyLoss, optimizing using sgd like so:

for epoch in range(epochs):

    tl = 0   
    vl = 0  
    batchCounter = 0
    v = 0   

    for k in trainBatches:


        input_ = Variable(torch.FloatTensor(XTY[k:k+batchSize,:-1]))
        target_ = Variable(torch.FloatTensor(XTY[k:k+batchSize,-1:])) # Last position
        # Forward
        output_ = nn1.forward(input_)

        # Backward / Optimize
        loss = lossFn(output_, torch.max(target_, 1)[0].long())
        loss.backward()     # Backprop
        optimizer.step()    # Gradient descent

        tl +=

    for v in valBatches:


        input_ = Variable(torch.FloatTensor(XTY[v:v+batchSize,:-1]))
        target_ = Variable(torch.FloatTensor(XTY[v:v+batchSize,-1:])) # Last position
        # Forward
        output_ = nn1.forward(input_)
        loss = lossFn(output_, torch.max(target_, 1)[0].long())
        vl += 

        batchCounter += 1

    lossT = np.append(lossT, tl/len(trainBatches)) # Log training loss
    lossV = np.append(lossV, vl/len(valBatches)) # Log training loss

My training and validation errors converge, even after 500 epochs:


I can’t anything helpful, any ideas?


nn.CrossEntropyLoss expects raw logits as the input as it internally calls F.log_softmax and nn.NLLLoss on the model output.
Just remove the self.softmax(out)) in your model’s forward and run it again.

Also some minor tips for your code:

  • Variables are deprecated since 0.4.0. You can just use tensors instead, you don’t have to warp them anymore.
  • Call the model directly as nn1(input_) instead of the forward() method, as this will make sure to properly register all hooks.

@ptrblck Sorry for the delayed answer. I followed the suggestion but removing that line of code does not help, still getting same results. I left it running for a long time just to make sure it isn’t something silly like that, same result:

Anything else that pops out? Bit at a loss here. Thanks.

As a side note: I’m trying to overfit my model with a small data (10k) and 200 neurons (1 layer) and I can’t (same pattern as above). Is there anything wrong in the definition of the model itself?

I don’t know what kind of data you are using, but your model might just not have enough capacity.
Try to overfit using a smaller sample size, e.g. 10 samples and see how the training loss behaves.
If that looks good, you could add some more samples and see it the model is still able to learn the data.

Yes it works with a small sample, so model is learning.


What kind of model did you use for the 10k samples?
As your model is quite simple could you just increase the number of neurons in the hidden layer and see how the training loss behaves?

I realized that my classes are very highly imbalanced and the model was behaving as expected by classifying always as the dominant class, that’s why the errors were converging to the observed proportion of cases in the minority class. After resampling (undersample majority class, oversample minority class) and adding weights to the loss function to correct persisting imbalances my results now look like this:


Thanks for your time @ptrblck

1 Like