I’m training a VAE for the later stage (diffusion) of the latent diffusion model framework. I tried a few configurations (e.g. changing the number of resblocks or number of channels), and each of the training loop of one configuration is run 5 times (corresponding to 5 seeds).
Question: Which seed should I choose for the model parameters for the later stage (training UNet), the one that achieves the best validation loss or the one has a median validation loss? For reporting the performance of the VAE, we can choose the mean of validation loss across the seeds (Report model training results among different seeds). But for training, we can only choose 1.
If your goal is to have the best trained model to then use to generate stuff,
choose the trained model (that happens to have come from some specific seed)
that has the best validation loss.
On the other hand, if you are, say, researching training protocols, it would
probably make sense to report the median validation loss (or more generally,
the distribution of validation losses) obtained from a series of training runs.
Picking the best model (provided that you don’t pick the model that performs
best on your so-called test set) is a fully legitimate approach.
By way of example, suppose you have a function defined on a square and you want
to find an approximate minimum of that function. A legitimate algorithm would be
to sample a set of random points from that square, evaluate the function at those
points, and keep the lowest value. This generally won’t be anywhere near the best
algorithm, but it’s logically sensible.
Elaborating, sample a set of random starting points, run some number of
gradient-descent optimization steps starting with each of the starting points,
and keep the lowest optimized value. This would also be legitimate and would be
akin to keeping the best validation loss produced by your five randomly-seeded
training runs.