You won’t be able to combine the two models together in some naive
way. The problem is that while model A has learned to distinguish a
“Worm” from a “Brittlestar” and model B has learned to distinguish a
“Prawn” from a “Seaspider,” neither model has learned to distinguish
a “Worm” from a “Prawn.”
So unless you do some training where your training data includes both “Worm” and “Prawn” samples, your combined model will likely
not work well.
You appear to be training a multi-class classifier. A standard loss
function for doing so is CrossEntropyLoss. One common technique for
dealing with skewed (unbalanced) data is to use CrossEntropyLoss’s weight constructor-argument to weight less common classes more
heavily in your training.
Depending on how your training data is structured, you could also
experiment with WeightedRandomSampler.