Returning a network from a training function

Mahmoud_Abdelkhalek · October 13, 2020, 3:36pm

My main training and validation loop looks like this:

import torch

def train(net,dataloader,loss_func,optimizer,device):
    
    net.train()
    num_true_pred = 0
    total_loss = 0
    
    for images,labels in dataloader:
        
        images = images.to(device)
        labels = labels.to(device)
        
        optimizer.zero_grad()
        
        outputs = net(images)
        loss = loss_func(outputs,labels)
        
        loss.backward()
        optimizer.step()
        
        class_preds = outputs > 0 # for binary cross entropy
        num_true_pred += torch.sum(class_preds == labels)
        
        total_loss += loss
    
    train_loss = total_loss.item() / len(dataloader)
    train_acc = num_true_pred.item() / len(dataloader)
    
    return net,train_loss,train_acc

def validate(net,dataloader,loss_func,device):
    
    net.eval()
    num_true_pred = 0
    total_loss = 0
    
    for images,labels in dataloader:
        
        images = images.to(device)
        labels = labels.to(device)
        
        with torch.no_grad():
            outputs = net(images)
            loss = loss_func(outputs,labels)
        
        class_preds = outputs > 0 # for binary cross entropy
        num_true_pred += torch.sum(class_preds == labels)
        
        total_loss += loss
    
    val_loss = total_loss.item() / len(dataloader)
    val_acc = num_true_pred.item() / len(dataloader)
    
    return val_loss,val_acc

# GPU

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# initialize datasets and dataloaders

train_dataset = ...
train_dataloader = ...
val_dataset = ...
val_dataloader = ...

# initialize net and move to GPU

net = ...
net = net.to(device)

# initialize loss function (e.g. binary cross entropy)

loss_func = ...

# initialize optimizer (e.g. SGD)

optimizer = ...

# number of epochs to train and validate for

num_epochs = ...

for epoch in range(num_epochs):
    
    net,train_loss,train_acc = train(net,train_dataloader,loss_func,
                                     optimizer,device)
    
    val_loss,val_acc = validate(net,val_dataloader,loss_func,device)

My main question is about the train function. Do I need to return the network net as well as the train_loss and train_acc? What I mean is, is the network net mutable such that any changes that are done to it inside the train function reflect outside of it? I should then be able to change the for loop at the end to:

for epoch in range(num_epochs):
    
    train_loss,train_acc = train(net,train_dataloader,loss_func,
                                 optimizer,device)
    
    val_loss,val_acc = validate(net,val_dataloader,loss_func,device)

Also, please let me know if there are other ways to improve this code, since this is the template that I use for all my training and validation loops.

albanD · October 13, 2020, 3:38pm

Hi,

Yes the net is modified inplace by the optimizer. So no need to return it.

Mahmoud_Abdelkhalek · October 13, 2020, 3:56pm

Thanks for the quick reply. Do you have any other suggestions for modifying the structure of my code? I use this structure a lot so just want to know if there is any way of making it more efficient.

albanD · October 13, 2020, 5:46pm

I think it’s quite good.
The only comments I would make are:
Do total_loss += loss.item() to convert the loss to a python number directly. This will make sure you don’t build the autograd graph for things that don’t need it.
You can wrap you validate function with @torch.no_grad() to disable the autograd in the whole function if you want (you already do it for the model which is the most important part).

Looks good otherwise.

Mahmoud_Abdelkhalek · October 13, 2020, 5:57pm

Thanks a lot for the feedback! Really appreciate it.