Returning a network from a training function

My main training and validation loop looks like this:

import torch

def train(net,dataloader,loss_func,optimizer,device):
    
    net.train()
    num_true_pred = 0
    total_loss = 0
    
    for images,labels in dataloader:
        
        images = images.to(device)
        labels = labels.to(device)
        
        optimizer.zero_grad()
        
        outputs = net(images)
        loss = loss_func(outputs,labels)
        
        loss.backward()
        optimizer.step()
        
        class_preds = outputs > 0 # for binary cross entropy
        num_true_pred += torch.sum(class_preds == labels)
        
        total_loss += loss
    
    train_loss = total_loss.item() / len(dataloader)
    train_acc = num_true_pred.item() / len(dataloader)
    
    return net,train_loss,train_acc

def validate(net,dataloader,loss_func,device):
    
    net.eval()
    num_true_pred = 0
    total_loss = 0
    
    for images,labels in dataloader:
        
        images = images.to(device)
        labels = labels.to(device)
        
        with torch.no_grad():
            outputs = net(images)
            loss = loss_func(outputs,labels)
        
        class_preds = outputs > 0 # for binary cross entropy
        num_true_pred += torch.sum(class_preds == labels)
        
        total_loss += loss
    
    val_loss = total_loss.item() / len(dataloader)
    val_acc = num_true_pred.item() / len(dataloader)
    
    return val_loss,val_acc

# GPU

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# initialize datasets and dataloaders

train_dataset = ...
train_dataloader = ...
val_dataset = ...
val_dataloader = ...

# initialize net and move to GPU

net = ...
net = net.to(device)

# initialize loss function (e.g. binary cross entropy)

loss_func = ...

# initialize optimizer (e.g. SGD)

optimizer = ...

# number of epochs to train and validate for

num_epochs = ...

for epoch in range(num_epochs):
    
    net,train_loss,train_acc = train(net,train_dataloader,loss_func,
                                     optimizer,device)
    
    val_loss,val_acc = validate(net,val_dataloader,loss_func,device)

My main question is about the train function. Do I need to return the network net as well as the train_loss and train_acc? What I mean is, is the network net mutable such that any changes that are done to it inside the train function reflect outside of it? I should then be able to change the for loop at the end to:

for epoch in range(num_epochs):
    
    train_loss,train_acc = train(net,train_dataloader,loss_func,
                                 optimizer,device)
    
    val_loss,val_acc = validate(net,val_dataloader,loss_func,device)

Also, please let me know if there are other ways to improve this code, since this is the template that I use for all my training and validation loops.

Hi,

Yes the net is modified inplace by the optimizer. So no need to return it.

Thanks for the quick reply. Do you have any other suggestions for modifying the structure of my code? I use this structure a lot so just want to know if there is any way of making it more efficient.

I think it’s quite good.
The only comments I would make are:
Do total_loss += loss.item() to convert the loss to a python number directly. This will make sure you don’t build the autograd graph for things that don’t need it.
You can wrap you validate function with @torch.no_grad() to disable the autograd in the whole function if you want (you already do it for the model which is the most important part).

Looks good otherwise.

1 Like

Thanks a lot for the feedback! Really appreciate it.