Hi to everyone,
I am currently working on a FeedForward network on the MNIST database.
It has 784 inputs, two hidden layers of 100 nodes each, and an 10-node output layer.
I use PyTorch’s Adam optimizer and CrossEntropyLoss loss function.
I started from this official PyTorch example and modified my way through.
Regarding the train/test steps, the actual implementation shouldn’t be too much different from that one.
My problem is the following:
The training loss has sharp jumps at each epochs, as if the Adam optimizer resets in some way (i think). Below a plot of what happens:
I am not uploading any of my code yet, as my implementation is quite complex and I am working with some “unusual” constraints (hopefully not relevant to the problem at hand).
However, if someone is interested, I will work toward a minimal working example.
Are you storing and loading the model or optimizer in some way.
There were some threads on this topic in the past few weeks and so far most of the time we’ve found some code bugs in the training procedure.
I’m really interested in a working example.
Yes, I am storing the model, the optimizer, and all the other parameters as well in an ad hoc class.
class NNet:
def __init__(self, name):
self.name = name
. . . ## other conf variables
self.model = None
self.criterion = None
self.optimizer = None
self.scheduler = None
# data # i use these lists to store the training/validation avg. loss and accuracy
self.train = []
self.valid = []
self.test = []
# define training step with class method
def train_step(self, train_loader, epoch):
self.model.train()
if self.scheduler is not None:
self.scheduler.step()
for batch_idx, (data, label) in enumerate(train_loader):
inputV = Variable(data)
target = Variable(label)
output = self.model( inputV )
loss = self.criterion(output, target)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
# define validation/test step with class method
def test_step(self, test_loader):
self.model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, label in test_loader:
inputV = Variable(data)
target = Variable(label)
output = self.model( inputV )
test_loss += self.criterion(output, target, size_average=False).item() # sum up batch loss
pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
correct /= len(test_loader.dataset)
The class is initialized with some empty variables (model, criterion, etc), but then I create the model and save it in the class: