Adam 'resets' at each epochs

Hi to everyone,
I am currently working on a FeedForward network on the MNIST database.
It has 784 inputs, two hidden layers of 100 nodes each, and an 10-node output layer.
I use PyTorch’s Adam optimizer and CrossEntropyLoss loss function.
I started from this official PyTorch example and modified my way through.
Regarding the train/test steps, the actual implementation shouldn’t be too much different from that one.

My problem is the following:
The training loss has sharp jumps at each epochs, as if the Adam optimizer resets in some way (i think). Below a plot of what happens:

I am not uploading any of my code yet, as my implementation is quite complex and I am working with some “unusual” constraints (hopefully not relevant to the problem at hand).
However, if someone is interested, I will work toward a minimal working example.

Any help will be certainly appreciated!

Thanks,
Davide

Are you storing and loading the model or optimizer in some way.
There were some threads on this topic in the past few weeks and so far most of the time we’ve found some code bugs in the training procedure.
I’m really interested in a working example.

Yes, I am storing the model, the optimizer, and all the other parameters as well in an ad hoc class.

class NNet:

     def __init__(self, name):
         self.name = name
         . . . ## other conf variables

         self.model = None
         self.criterion = None
         self.optimizer = None
         self.scheduler = None
         
         # data # i use these lists to store the training/validation avg. loss and accuracy
         self.train = []
         self.valid = []
         self.test = []

    # define training step with class method
    def train_step(self, train_loader, epoch):
        
        self.model.train()
        if self.scheduler is not None:
            self.scheduler.step()
            
        for batch_idx, (data, label) in enumerate(train_loader):
            inputV = Variable(data)
            target = Variable(label)
            
            output = self.model( inputV )
            loss = self.criterion(output, target)
            
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
          
    # define validation/test step with class method
    def test_step(self, test_loader):
        self.model.eval()
        test_loss = 0
        correct = 0
        with torch.no_grad():
            for data, label in test_loader:
                inputV = Variable(data)
                target = Variable(label)
            
                output = self.model( inputV )
                test_loss += self.criterion(output, target, size_average=False).item() # sum up batch loss
                pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
                correct += pred.eq(target.view_as(pred)).sum().item()

        test_loss /= len(test_loader.dataset)         
        correct /= len(test_loader.dataset)

The class is initialized with some empty variables (model, criterion, etc), but then I create the model and save it in the class:

 def generate_entry(name):

    entry = NNet(name)
    entry.model = _user_defined_model_(**entry.model_args)
    entry.criterion = torch.nn.functional.cross_entropy
    ...

edit: found how to properly embed code :sweat_smile:

I am afraid that I won’t be able to provide one before next week.

He probably meant, whether you save the model/optimizer to a file (for later usage) and load it afterwards?