How can I correctly optimize my training model?

Hello everyone, I am using Unet for Image segmentation.
At the begining, the most basic training strategy was used to train the model. And the results is acceptable. The code of the train function is below.

‘’’
def train_model(model, optimizer, loss_fn, dataloader,args):

model.train()
for epoch in range(args.num_epochs):
    print('Epoch {}/{}'.format(epoch, args.num_epochs - 1))
    print('-' * 10)

    epoch_loss = 0
    step = 0
    acc= 0
    for i , (train_batch, labels_train ) in enumerate(dataloader):
        step += 1
        train_batch, labels_train = Variable(train_batch), Variable(labels_train )
        labels_train = labels_train.float()
     
        optimizer.zero_grad()
      
        output_batch = model(train_batch)
        output_flat = output_batch.view(-1)
        true_flat = labels_train.view(-1)
        loss = loss_fn(output_flat,true_flat)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()



epoch +=1
makeDirectory(path,args.num_epochs)
savepath = path+'/'+savename+'_'+str(epoch)+'/'+savename+'_'+str(epoch)+'.pth'

torch.save(model.state_dict(), savepath)
return model

’ ’ ’

Because I want to improve the results, and want to optimize the model in a statistical way, so I added something so that the model could be optimized according to the minimum loss.

’ ’ ’
def train_model(model, optimizer, loss_fn, dataloader,args):
is_better = True
min_loss = float(‘inf’)
model.train()
for epoch in range(args.num_epochs):
print(‘Epoch {}/{}’.format(epoch, args.num_epochs - 1))
print(’-’ * 10)

    epoch_loss = 0
    step = 0
    acc= 0
    for i , (train_batch, labels_train ) in enumerate(dataloader):
        step += 1
        train_batch, labels_train = Variable(train_batch), Variable(labels_train )
        labels_train = labels_train.float()
     
        optimizer.zero_grad()
      
        output_batch = model(train_batch)
        output_flat = output_batch.view(-1)
        true_flat = labels_train.view(-1)
        loss = loss_fn(output_flat,true_flat)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()


    if is_better:
    min_loss = epoch_loss
    best_model = model.state_dict()
epoch +=1
makeDirectory(path,args.num_epochs)
savepath = path+'/'+savename+'_'+str(epoch)+'/'+savename+'_'+str(epoch)+'.pth'


torch.save(best_model, savepath)
return model

‘’’
But the strange thing is, the epoch_loss kept jumping in a large scale. When I used the original code, there were some fluctuations in the loss, but the trend is in declining way, and the fluctuations only happens after more than 100 epochs. I don’t know why it does not work in this way.

If I use some validation set ,which has a very small number of pictures, like one image, and use the validation loss/accuracy to adjust the model parameter, is that more effective than only using the training set? If the validation function is called in the train function. should I add with torch.no_grad() in the validation function?

Thanks in advance

Hey! I think you are thinking about this the right way. The dangers of saving the model on the best training loss is that you risk overfitting. It would be better to save when your loss is low on a validation set and I’d say that 1 image is probably not enough - but you might not need that many if you do random crop on a few big validation images.

Yes, you should use with torch.no_grad() when you don’t train to speed things up. Could also be nice to put the model in eval() mode.

I don’t know why your “epoch_loss kept jumping in a large scale” - it wasn’t just a fluke?

PS: You can save several models if you have a decently sized hard drive. Like the best validation loss, training loss, validation accuracy. Just need to save them at different paths

@Oli Hello, thanks for your reply. I just tested, if I set the model to eval() mode, the epoch loss will “jump in a large scale” like I said. And the validation accuracy will stay in a fixed value after about at most 3 epochs. Because I think with torch.no_grad and eval() can accerlerate the training process. But obviously eval() is causing problem and the model stop to train so I remove that. I am not quiet sure why it happens. Because of the size of my training set is quiet small, so I can only take one of the Images as validation set, and I used random cropped patches to train the network, so for the model, all Images are completly new, could the original Images used as validation set ? So that I can use validation loss/accuracy to adjust the model parameter?

Some layers like dropout and batchnorm behaves differently during training and evaluation. train() and eval() mode converts between these modes. Maybe I wasn’t clear enough, but you typically put the model in eval() mode before evaluation and then put it in train() mode once the validation is over and you want to train again.

Yes, the method you are describing definitely works. The question is if it is the best way, or even a good way. If your training isn’t very long, I’d say just try it and see what the results are. Personally I think 1 image is a bit to little for validation but if you don’t have any way of getting more data I guess that’s it. Again, you can save several models and check which one is the best after training :slight_smile:

Edit: I know there is something called k-fold Cross-Validation but I haven’t seen it used in deep learning to much. Perhaps someone more experienced in this could help out?

@Oli Hey, thanks for your advice, I will try that first to see what happend. I check a lot of examples which call the validation function inside of the training process, none of them put the model back to train() again after validation. Thats why I am getting so confused.

Few people use cross-validation in deep learning, but it was very popular in other machine learning agorithums. I am not very familiar with that, it seems the only difference between k-fold and normal training, is the training set was divided into K subsets.

1 Like

What loss_fn are you using? Did you write a custom one, since you are flattening the prediction and target before passing them to your loss function.

BCEloss is used as loss_fn