Finetune example doubt at plot curve

rajasekhar · April 22, 2019, 5:06pm

I’m following the pytorch/finetuning tutoral. Below is some script for plot curve for validation loss

ohist = []
shist = []

ohist = [h.cpu().numpy() for h in hist]
shist = [h.cpu().numpy() for h in scratch_hist]

plt.title("Validation Accuracy vs. Number of Training Epochs")
plt.xlabel("Training Epochs")
plt.ylabel("Validation Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Pretrained")
plt.plot(range(1,num_epochs+1),shist,label="Scratch")
plt.ylim((0,1.))
plt.xticks(np.arange(1, num_epochs+1, 1.0))
plt.legend()
plt.show()

could help me to understand this, as i didn’t get what’s going here: plt.plot(range(1,num_epochs+1),ohist,label="Pretrained"),

ptrblck · April 22, 2019, 8:42pm

The code creates a plot for the validation accuracy of the fine-tuned model (ohist) and the model trained from scratch (shist).
Both accuracies will be visualized in a plot where the epochs are given in the x-axis and the accuracies in the y-axis.

PS: I’m not a fan of tagging certain people, as this might discourage others to post an answer.

rajasekhar · April 25, 2019, 1:40am

Sure. Won’t tag again. So, here’s where hist came from:

model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

and the train_model function:

def train_model(model, dataloaders, criterion, optimizer, num_epochs, is_inception=False):
    since = time.time()

    val_acc_history = []
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                       outputs, aux_outputs = model(inputs)
                       loss1 = criterion(outputs, labels)
                       loss2 = criterion(aux_outputs, labels)
                       loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

That plots only the validation accuracy. But I want plots such that it has to plot

Training Acc/Loss Vs. Epochs
Validation Acc/Loss Vs.Epochs

any suggestions?

ptrblck · April 25, 2019, 7:59am

You would have to store epoch_acc in another list for the training phase:

def train_model(...):
    train_acc_history = []
    val_acc_history = []
    ...

    for epoch in range(num_epochs):
        for phase in ['train', 'val']:
            ...

            if phase == 'val':
                val_acc_history.append(epoch_acc)
            elif phase == 'train':
                train_acc_history.append(epoch_acc)
            ...

    return model, val_acc_history, train_acc_history

rajasekhar · April 26, 2019, 1:25pm

Thanks. I tried by adding these lines:

            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)
                val_acc_history.append(epoch_loss)
            elif phase == 'train':
                train_acc_history.append(epoch_acc)
                train_acc_history.append(epoch_loss)

....
....
model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))
torch.save(model_ft.state_dict(), 'googlenet/standard_googlenet.pth')

ohist = []
#shist = []
ohist = [h.cpu.numpy() for h in hist] 
#shist = [h.cpu().numpy() for h in train_model.epoch_loss]

plt.title("Loss&Acc Vs. Number of Training Epochs")
plt.xlabel(" Epochs")
plt.ylabel("Accuracy & Loss")
plt.plot(range(1,num_epochs+1),ohist,label="Training Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Training Loss")
plt.plot(range(1,num_epochs+1),ohist,label="Validation Accuracy")
plt.plot(range(1,num_epochs+1),ohist,label="Validation Loss")
plt.ylim((0,1.))
plt.xticks(np.arange(1, num_epochs+1, 1.0))
plt.legend()
plt.savefig('plots.png')
plt.show()

but it throws an error:

Traceback (most recent call last):
  File "ftune.py", line 284, in <module>
    model_ft, hist = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))
ValueError: too many values to unpack (expected 2)

I tried also TensorboardX, like these snippets:

from tensorboardX import SummaryWriter
writer = SummaryWriter('runs')
...
...
def train_model(train_loader, model, criterion, optimizer, epoch):
        if i % args.print_freq == 0:
            print('Epoch: [{0}][{1}/{2}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                  'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                   epoch, i, len(train_loader), batch_time=batch_time,
                   data_time=data_time, loss=losses, top1=top1, top5=top5))
            niter = (epoch * len(train_loader))+i
            writer.add_scalar('Train/Loss', losses.val, niter)
            writer.add_scalar('Train/Prec@1', top1.val, niter)
            writer.add_scalar('Train/Prec@5', top5.val, niter)

def validate(val_loader, model, criterion):

        if i % args.print_freq == 0:
            print('Test: [{0}/{1}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                  'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                   i, len(val_loader), batch_time=batch_time, loss=losses,
                   top1=top1, top5=top5))
            niter = epoch*len(train_loader)+i  
            writer.add_scalar('Test/Loss', losses.val, niter)
            writer.add_scalar('Test/Prec@1', top1.val, niter)
            writer.add_scalar('Test/Prec@5', top5.val, niter)

but in validate function it throws an error like epoch is not defined etc.

could you share some snippets for TensorboardX?

ptrblck · April 27, 2019, 7:59pm

The first error is thrown, since you are now returning three objects/values:

return model, val_acc_history, train_acc_history

The second is thrown, because epoch is indeed not defined in validate, so you could pass it as an argument to the function.

rajasekhar · May 2, 2019, 12:16pm

Can I export validation plot data to a a csv file?

Any example?

ptrblck · May 2, 2019, 12:19pm

This should be possible using pandas.
You would have to create a pd.DataFrame containing your validation values and save it using df.to_csv.

Niharika_Bhattacharj · December 19, 2019, 1:14am

Hi, I am running into the same problem as I am a beginner. I was wondering how do you change the code so it expects all three objects. Even though you include all three objects at the return statement why does it still say it expects only 2.

ptrblck · December 19, 2019, 2:08am

I’m not sure I understand the issue correctly.
Could you post the error message or explain your use case a bit?
Which function expects two arguments?

Niharika_Bhattacharj · December 19, 2019, 2:29am

I am just having trouble creating a graph for my loss values. The tutorial (Finetuning Torchvision Models — PyTorch Tutorials 2.2.0+cu121 documentation) only shows how to create the graph for validation accuracy, however I wanted to plot the validation loss as well. The error I am getting occurs after I try to make a list for a variable called val_loss_history, similar to how they wrote val_acc_history.

Apologies in advance if this is confusing.

Niharika_Bhattacharj · December 19, 2019, 2:30am

ptrblck · December 19, 2019, 4:14am

Since you are returning 3 values in train_model, you would also have to assign them:

model_ft, hist, losses = train_model(model_ft, dataloaders_dict, criterion, optimizer_ft, num_epochs=num_epochs, is_inception=(model_name=="inception"))

PS: you can add code snippets by wrapping them in three backticks ```

Niharika_Bhattacharj · January 7, 2020, 1:41am

Thank you, I was also wondering how I could save the trained model after every epoch. I am currently saving it like so:

, but that saves the entire model.

ptrblck · January 7, 2020, 4:11am

That’s the right approach of saving the model’s state_dict.
Why is it unexpected to save the entire model?

Niharika_Bhattacharj · January 7, 2020, 4:31am

I want to save it using checkpoints as my best val accuracy is at epoch 10, and I trained to epoch 14. I was looking at this discussion on how to do it How resume the saved trained model at specific epoch, however I don’t know how to change my line (listed below) such that it saves the entire model using those checkpoints at every epoch.

torch.save(model_ft.state_dict(), ‘C:\Users\Niharika\Desktop\CavitiesCNN\results6\model.pth’)
torch.save(optimizer.state_dict(), ‘C:\Users\Niharika\Desktop\CavitiesCNN\results6\optimizer.pth’).

ptrblck · January 7, 2020, 5:07am

If you want to store a checkpoint for every epoch, you could add the epoch counter to the name:

torch.save(model_ft.state_dict(), 'C:\Users\Niharika\Desktop\CavitiesCNN\results6\model_epoch{}.pth'.format(epoch))