Dear PyTorch Community,

I am currenly working on a small sanity check for my RNN using sequential MNIST classification and was wondering whether I need to collect loss and other metrics like top1 accuracy and top5 accuracy in a list and then compute the average of the list ?

This is currently done in `def (train_loader, model, optimizer, loss_f):`

. The train function is then being called in the `def main():`

where the training loop over the epochs is being called. Now please correct my understanding, the train function performs operations over each iteration within a single epoch. If this is the case then I should collect loss and metrics in a list, and average them once iterations are over, meaning that an epoch has ended, correct ?

```
def train (train_loader, model, optimizer, loss_f):
'''
Input: train loader (torch loader), model (torch model), optimizer (torch optimizer)
loss function (torch custom yolov1 loss).
Output: loss (torch float).
'''
model.train()
loss_lst = []
top1_acc_lst = []
top5_acc_lst = []
for batch_idx, (x, y) in enumerate(train_loader):
x, y = x.to(device), y.to(device)
# turn [64, 784] to [64, 784, 784]
x_expanded = x[:, None, ...].expand(x.shape[0], x.shape[1], x.shape[1]).to(device)
#x_expanded = x.reshape(-1, sequence_length, input_size)
out = model(x_expanded)
del x
del x_expanded
out = F.softmax(out, dim = 1)
# store top1 accuracy, top5 accuracy and loss per iteration in list
top1_acc_lst.append(top1accuracy(out, y, batch_size))
top5_acc_lst.append(top5accuracy(out, y, batch_size))
loss_val = loss_f(out, y)
loss_lst.append(float(loss_val.item()))
del y
del out
optimizer.zero_grad()
loss_val.backward()
optimizer.step()
# compute the average within each list to obtain final value for a single epoch
top1_acc = lst_avg(top1_acc_lst)
top5_acc = lst_avg(top5_acc_lst)
loss_val =lst_avg(loss_lst)
return (loss_val, top1_acc, top5_acc)
```

```
def main():
print(f'Simple RNN initalised with {nlayers} layers and {hidden_size} number of hidden neurons.')
model = SimpleRNN(input_size = input_size*input_size, hidden_size = hidden_size, num_layers=nlayers, output_size = 10, activation = 'relu').to(device)
optimizer = optim.Adam(model.parameters(), lr = lr, weight_decay = weight_decay)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max = 145, eta_min = 0)
loss_f = nn.CrossEntropyLoss()
train_loss_lst = []
test_loss_lst = []
train_top1acc_lst = []
test_top1acc_lst = []
train_top5acc_lst = []
test_top5acc_lst = []
last_epoch = 0
train_dataset = torchvision.datasets.MNIST(root = data_dir,
train=True,
transform=T.Compose([T.ToTensor(), T.Lambda(torch.flatten)]),
download=True)
test_dataset = torchvision.datasets.MNIST(root = data_dir,
train = False,
transform=T.Compose([T.ToTensor(), T.Lambda(torch.flatten)]))
train_loader = DataLoader(dataset=train_dataset,
batch_size = batch_size,
shuffle = True)
test_loader = DataLoader(dataset=test_dataset,
batch_size = batch_size,
shuffle = False)
for epoch in range(nepochs - last_epoch):
# 1. linear increase from 0.00001 to 0.0001 over 5 epochs
if epoch + last_epoch > 0 and epoch + last_epoch <= 5:
optimizer.param_groups[0]['lr'] = 0.00001 +(0.00009/5) * (epoch + last_epoch)
# 2. decrease from 0.0001 to 0 using cosine annealing
elif epoch + last_epoch > 5:
scheduler.step()
train_loss_value, train_top1acc_value, train_top5acc_value = train(train_loader, model, optimizer, loss_f)
train_loss_lst.append(train_loss_value)
train_top1acc_lst.append(train_top1acc_value)
train_top5acc_lst.append(train_top5acc_value)
test_loss_value, test_top1acc_value, test_top5acc_value = test(test_loader, model, loss_f)
test_loss_lst.append(test_loss_value)
test_top1acc_lst.append(test_top1acc_value)
test_top5acc_lst.append(test_top5acc_value)
print(f"Epoch:{epoch + last_epoch + 1 } Train[Loss:{train_loss_value} Top5 Acc:{train_top5acc_value} Top1 Acc:{train_top1acc_value}]")
print(f"Epoch:{epoch + last_epoch + 1 } Test[Loss:{test_loss_value} Top5 Acc:{test_top5acc_value} Top1 Acc:{test_top1acc_value}]")
```

However, there are a few things that strike me as a little odd. For one,

test accuracy seems to be always a little better than train accuracy with weight_decay = 0.0005 enforcing a very small regularisation. In theory and practice this could explain the small performance edge test has over train. However, the regularisation is small so what I am doing is incorrect. Furthermore, if I do not compute metrics by averaging, my accuray is maxed at 0.5. I suspected that this was the case, since I was initally only retaining the metric over the last iteration within an epoch.

However when averaging I know reach quite acceptable performance metrics when training that are well over 0.5 as can be seen below:

```
Simple RNN initalised with 2 layers and 64 number of hidden neurons.
Epoch:1 Train[Loss:2.282 Top5 Acc:0.6882 Top1 Acc:0.1736]
Epoch:1 Test[Loss:2.1989 Top5 Acc:0.8553 Top1 Acc:0.2897]
Epoch:2 Train[Loss:1.8816 Top5 Acc:0.9194 Top1 Acc:0.6155]
Epoch:2 Test[Loss:1.7117 Top5 Acc:0.9668 Top1 Acc:0.7689]
Epoch:3 Train[Loss:1.6802 Top5 Acc:0.9703 Top1 Acc:0.7997]
Epoch:3 Test[Loss:1.6411 Top5 Acc:0.9752 Top1 Acc:0.8336]
Epoch:4 Train[Loss:1.6395 Top5 Acc:0.9717 Top1 Acc:0.8345]
Epoch:4 Test[Loss:1.6024 Top5 Acc:0.9749 Top1 Acc:0.8623]
Epoch:5 Train[Loss:1.6008 Top5 Acc:0.9779 Top1 Acc:0.867]
Epoch:5 Test[Loss:1.5862 Top5 Acc:0.9718 Top1 Acc:0.8763]
Epoch:6 Train[Loss:1.6007 Top5 Acc:0.9717 Top1 Acc:0.865]
Epoch:6 Test[Loss:1.5916 Top5 Acc:0.9775 Top1 Acc:0.8713]
```

Please let me know what you think and whether my understanding is correct. I would be happy to learn.

Kind regards,

weight_thetas