Discussion about modern training strategies

Zador · September 19, 2022, 2:08pm

Previously I have put a lot of effort into training networks appropriately. However, talking to colleagues, a lot of the things I did may be redundant due to novel optimizers and the theory of deep learning, who are training networks for beating benchmarks in academia. I am hoping for a brief discussion. Here are the things I used to do which might not be necessary, which were inspired by traditional ML classes not DeepL specific classes:

using train-val-test split instead of just train-test: During training I used an additional val split, which I used to check if the training is converging or overfitting. I heard from colleagues that they simply use the training set running loss to check for convergence (and don’t need to check for overfitting)
Using learning rate schedulers. My colleagues do not anymore because optimizers such as Adam have their own. I know this to be true but I still did it because it seemed to be a general procedure.
Changing training at validation plateau:
3.1. I used to use ReduceLROnPlateau from pytorch. when validation loss starts to increase, the learning rate is reduced
3.2. I used to save checkpoints until reaching plateau → reset to this checkpoint when val loss increases → decrease learning rate and continue

Notice for 1 and 3 I made these decisions based on a patience period, for example, declare overfit/convergence if after 10 epochs the validation loss hasn’t decreased.

What do you think? What is redundant? is anything potentially harmful?