Hi there,
I’m experimenting with stochastic weight averaging.
What I have done so far:
- Initialized the network with my current best model
- Reduced the learning rate I used for my current best model by one order of magnitude.
- Train for 20 epochs using EMA weighting with the following setup:
AveragedModel(self.model, multi_avg_fn=get_ema_multi_avg_fn(0.999))
However, this strategy did not improve the model.
Now I’m wondering if the strategy I chose makes sense. My questions are
- Should I use a higher learning rate instead?
- Should I average over more epochs?
- Should I use equal weight averages instead of EMA? Or just a different alpha?
I would be very grateful if someone could share their experience
Thanks a lot!
Thorsten