LBFGS not working on NN, loss not decreasing

Desi20 · April 10, 2018, 1:38pm

Hi all, I am trying to compare different optimizer on a NN, however, the L-BFGS algorithm does not work and I don’t know why. The loss is not decreasing and my accuracy is very bad. SGD and Adam do work, so I wonder where my mistake is.

Here is my code:

#Load packages
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
import torch.nn.functional as F

“”“Load MNIST dataset”“”
train_dataset = dsets.MNIST(root = ‘./data’, train=True,
transform = transforms.ToTensor(),
download = True)
test_dataset = dsets.MNIST(root = ‘./data’, train=False,
transform = transforms.ToTensor())

“”“Make dataset iterable”“”
batch_size=1000

train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size,
shuffle = True)

test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size,
shuffle = False)

“”“Create model class”“”
class FFN(nn.Module):
def init(self):
super(FFN, self).init()
#Linear functions
self.fc1 = nn.Linear(28*28, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
    out = F.relu(self.fc1(x)) #Non-linearity, can be changed to Tanh,ReLu
    out = F.relu(self.fc2(out))
    #Linear function (readout)
    out = self.fc3(out)
    return out
“”“Instantiate Model and Optimizer”“”
model = FFN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(model.parameters())

“”“Train Model”“”
epochs = 5
for epoch in range(epochs):
for i, (images, labels) in enumerate(train_loader):
#Load images as Variables
images = Variable(images.view(-1, 28*28))
labels = Variable(labels)
    def closure():
        #Clear gradients, not be accumulated
        optimizer.zero_grad()

        #Forward pass to get output
        outputs = model(images)

        #Calculate Loss: softmax + cross entropy loss
        loss = criterion(outputs, labels)

        #Get gradients 
        loss.backward()
        return loss
    
    #update parameters
    optimizer.step(closure)
    
    print('Epoch: {}, Loss: {}'.format(epoch, loss.data[0]))
    
    if (i+1) % 100 == 0:
        
        #Calculate accuracy on testset
        correct = 0 
        total = 0
        #Iterate through test data set
        for images, labels in test_loader:
            #Load images to a Torch Variable
            images = Variable(images.view(-1, 28*28))
            
            #Forward pass only to get output
            outputs = model(images)
            
            #Get prediction
            _, predicted = torch.max(outputs.data,1)
            
            #total number of labels
            total += labels.size(0)
            
            #Total correct predictions
            correct += (predicted ==labels).sum()
        
        accuracy = 100*correct /total
        
        #Print
        print('Epoch: {}, Loss: {}, Accuracy on testset: {}'.format(epoch, loss.data[0], accuracy))

ptrblck · April 10, 2018, 7:54pm

Try to lower the learning rate to lr=0.1.
I stopped the training after some iterations and the test accuracy was ~95%.

Desi20 · April 10, 2018, 9:38pm

I lowered it in my code, but I still get always the same loss on the training set of 56198.06640625 and my test accuracy is 9.8%. Increasing the batch_size e.g. does not help neither.

ptrblck · April 10, 2018, 9:42pm

That’s strange. I just lowered the learning rate to 0.1 and returned the loss, since it was missing in your code: loss = optimizer.step(closure).

Maybe I was lucky? I’ll try it with different seeds to check if the approach is reliable.

ptrblck · April 10, 2018, 10:01pm

I tested it with different seeds and it seems to work:

Epoch: 0, Loss: 2.30485224724
Epoch: 0, Loss: 1.00339114666
Epoch: 0, Loss: 0.672091126442
Epoch: 0, Loss: 0.501866996288
...
Epoch: 0, Loss: 0.158133521676
Test accuracy: 95%

Which PyTorch version are you using?

Desi20 · April 11, 2018, 6:18am

True, I was missing that loss = optimizer.step(closure). But I still get

Epoch: 0, Loss: 2.307893753051758
Epoch: 0, Loss: 2.3076794147491455
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan

and the accuracy of 9.8%. It’s super strange that it is working for you and not for me.

And I’m using version 0.3.1.post2, so it’s up-to date.

ptrblck · April 11, 2018, 8:34am

On my main machine I’ve compiled from master, so I installed 0.3.1.post2 on another machine and got the same results.

Could you please run this code on your machine and report the results?

Desi20 · April 11, 2018, 8:56am

I copied your code and only had to change root to root = ‘./data’ because otherwise the access was denied.

But I still get
Epoch: 0, Loss: 2.304852247238159
Epoch: 0, Loss: 2.308029890060425
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan
Epoch: 0, Loss: nan

And I really don’t get it because SGD and ADAM work perfectly fine. And if it is working for you, why not for me.

ptrblck · April 11, 2018, 9:23am

Ok, that’s strange. Maybe someone else could try to reproduce this issue?

Desi20 · April 11, 2018, 10:31am

I found the mistake, I had Python 3.5 installed in Anaconda and I needed Python 3.6.

RFoldes · October 30, 2024, 3:24pm

Hi,

I am sorry to re-open an old post, but this script puzzles me a bit.

Based on what I read about lbfgs, it should only work with all data at the same time, whereas as far as I can see in this example it is employed using mini-batches.

In this way, it is not updating the network parameters after each mini-batch instead of of after the entire dataset has been processed?

Thanks.