Setting learning rate for Stochastic Weight Averaging

Hi there,

I’m experimenting with stochastic weight averaging.

What I have done so far:

  • Initialized the network with my current best model
  • Reduced the learning rate I used for my current best model by one order of magnitude.
  • Train for 20 epochs using EMA weighting with the following setup:
    AveragedModel(self.model, multi_avg_fn=get_ema_multi_avg_fn(0.999))

However, this strategy did not improve the model.

Now I’m wondering if the strategy I chose makes sense. My questions are

  • Should I use a higher learning rate instead?
  • Should I average over more epochs?
  • Should I use equal weight averages instead of EMA? Or just a different alpha?

I would be very grateful if someone could share their experience :slight_smile:

Thanks a lot!
Thorsten