Is there a downside to loading the pre-training model?
When we are training the model, Load the pre-training model can get the following advantage:
- Accelerated training, and can use less training epoch
- Can avoid getting caught up in local optima or saddle points
Is it possible to say that all training can depend on the pretraining model?
Is there a downside to loading the pre-training model?