Train on one set of classes, report metrics on subset of classes


I’ve been searching everywhere for info on how to best implement this, but I’ve come up short on helpful info.

I’m attempting to build an image classifier model in pytorch that’s trained on one set of image classes (e.g: Dogs vs Cats) but then report test accuracy metrics across subsets of the test dataset to look for trends in how the model performs (e.g: Subclasses like “Brown Dogs, White Dogs, Short Hair Dogs, Long Hair Dogs”)

I know I could build a dataset/dataloader for the test set by concating or stacking all the test sub-datasets into one dataloader, but I’m not sure if there’s a smarter way to generate metrics for a model so I can sort and display them in tensorboard, as I’m about to just iterate through the ~100 different possible subsets in the training dataset and report metrics for each one individually after each epoch of training. This seems like it’d be far worse performance-wise than tagging and collecting the performance metrics as the dataset is processed in the test step from one dataset formed from all the smaller ones though.

I’m by no means a pytorch expert though, and some of the more obscure functions in pytorch have documentation that I struggle to understand, so the answer could be in plain sight in front of me and I’m just not seeing it.