LBFGS not working on NN, loss not decreasing

Hi all, I am trying to compare different optimizer on a NN, however, the L-BFGS algorithm does not work and I don’t know why. The loss is not decreasing and my accuracy is very bad. SGD and Adam do work, so I wonder where my mistake is.

Here is my code:

#Load packages
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
import torch.nn.functional as F

“”“Load MNIST dataset”""
train_dataset = dsets.MNIST(root = ‘./data’, train=True,
transform = transforms.ToTensor(),
download = True)
test_dataset = dsets.MNIST(root = ‘./data’, train=False,
transform = transforms.ToTensor())

“”“Make dataset iterable”""
batch_size=1000

train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size,
shuffle = True)

test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size,
shuffle = False)

“”“Create model class”""
class FFN(nn.Module):
def init(self):
super(FFN, self).init()
#Linear functions
self.fc1 = nn.Linear(28*28, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
    out = F.relu(self.fc1(x)) #Non-linearity, can be changed to Tanh,ReLu
    out = F.relu(self.fc2(out))
    #Linear function (readout)
    out = self.fc3(out)
    return out

“”“Instantiate Model and Optimizer”""
model = FFN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(model.parameters())

“”“Train Model”""
epochs = 5
for epoch in range(epochs):
for i, (images, labels) in enumerate(train_loader):
#Load images as Variables
images = Variable(images.view(-1, 28*28))
labels = Variable(labels)

    def closure():
        #Clear gradients, not be accumulated
        optimizer.zero_grad()

        #Forward pass to get output
        outputs = model(images)

        #Calculate Loss: softmax + cross entropy loss
        loss = criterion(outputs, labels)

        #Get gradients 
        loss.backward()
        return loss
    
    #update parameters
    optimizer.step(closure)
    
    print('Epoch: {}, Loss: {}'.format(epoch, loss.data[0]))
    
    if (i+1) % 100 == 0:
        
        #Calculate accuracy on testset
        correct = 0 
        total = 0
        #Iterate through test data set
        for images, labels in test_loader:
            #Load images to a Torch Variable
            images = Variable(images.view(-1, 28*28))
            
            #Forward pass only to get output
            outputs = model(images)
            
            #Get prediction
            _, predicted = torch.max(outputs.data,1)
            
            #total number of labels
            total += labels.size(0)
            
            #Total correct predictions
            correct += (predicted ==labels).sum()
        
        accuracy = 100*correct /total
        
        #Print
        print('Epoch: {}, Loss: {}, Accuracy on testset: {}'.format(epoch, loss.data[0], accuracy))

Try to lower the learning rate to lr=0.1.
I stopped the training after some iterations and the test accuracy was ~95%.

I lowered it in my code, but I still get always the same loss on the training set of 56198.06640625 and my test accuracy is 9.8%. Increasing the batch_size e.g. does not help neither.

That’s strange. I just lowered the learning rate to 0.1 and returned the loss, since it was missing in your code: loss = optimizer.step(closure).

Maybe I was lucky? I’ll try it with different seeds to check if the approach is reliable.

I tested it with different seeds and it seems to work:

Epoch: 0, Loss: 2.30485224724
Epoch: 0, Loss: 1.00339114666
Epoch: 0, Loss: 0.672091126442
Epoch: 0, Loss: 0.501866996288
...
Epoch: 0, Loss: 0.158133521676
Test accuracy: 95%

Which PyTorch version are you using?

True, I was missing that loss = optimizer.step(closure). But I still get

Epoch: 0, Loss: 2.307893753051758
Epoch: 0, Loss: 2.3076794147491455
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan

and the accuracy of 9.8%. It’s super strange that it is working for you and not for me.

And I’m using version 0.3.1.post2, so it’s up-to date.

On my main machine I’ve compiled from master, so I installed 0.3.1.post2 on another machine and got the same results.

Could you please run this code on your machine and report the results?

I copied your code and only had to change root to root = ‘./data’ because otherwise the access was denied.

But I still get
Epoch: 0, Loss: 2.304852247238159
Epoch: 0, Loss: 2.308029890060425
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan

And I really don’t get it because SGD and ADAM work perfectly fine. And if it is working for you, why not for me.

Ok, that’s strange. Maybe someone else could try to reproduce this issue?

I found the mistake, I had Python 3.5 installed in Anaconda and I needed Python 3.6.