Best practices for collecting metrics

Hi,

I’ve been struggling with collecting a lot of different metrics in PyTorch’s train/val loops. I always end up with a lot of boilerplate code that looks a lot like this:

for epoch in range(num_epochs):

  metric1_train = []
  metric2_train = []
  metric3_train = []
    ...
  
  model.train()
  for batch in train_loader:
      ...
      metirc1_train.append(...)
      metric2_train.append(...)
      metric3_train.append(...)
      ...
  
  wandb.log({
    "train/metric1": np.mean(metirc1_train),
    ...
  })

  model.eval()
  with torch.no_grad():
      metric1_val = []
      metric2_val = []
      metric3_val = []
      ...
      for batch in val_loader:
        ...
        metirc1_val.append(...)
        metric2_val.append(...)
        metric3_val.append(...)
        ...

  wandb.log({
    "val/metric1": np.mean(metirc1_val),
    ...
  })

I want to not have to repeat all the code in the validation part. I tried tackling this in a OOP way but it didn’t end up pretty. I’d also like it to be easy to add new metrics, this I couldn’t handle well.

I wanna ask if there is a standard, best practice way to do something like this that handles 10s of metrics?

Thanks!

I think you might be interested in “PyTorch Lightning” and “TorchMetrics”, which wrap PyTorch:

https://torchmetrics.readthedocs.io/en/latest/

In case you want to use vanilla PyTorch, I would use a separate dictionaries for:

  • input data
  • model outputs
  • losses
  • metrics
    By doing so, you can prevent a lot of duplicative lines by using dictionary comprehension.
for batch_dict in train_loader:
   preds_dict = model(**batch_dict)
   loss_dict = loss_fn(**preds_dict, **batch_dict)
   metrics_dict = metric_fn(**pred_dict, **batch_dict)
   wandb.log(metrics_dict)

I like this format of writing since it automatically unpacks only the relevant inputs for me, the rest of the data is left in **kwargs in the function signature.
Hope this helps!

1 Like