A Very strange phenomenon I met in training machine translation

DoubtWang · September 17, 2019, 10:38am

Namespace(batch_size=32, cuda=True, d_inner_hid=2048, d_k=64, d_model=512, d_v=64, d_word_vec=512, data=‘data/multi30k.atok.low.pt’, dropout=0.1, embs_share_weight=False, epoch=10, label_smoothing=True, log=None, max_token_seq_len=102, n_head=8, n_layers=6, n_warmup_steps=4000, no_cuda=False, proj_share_weight=True, save_mode=‘best’, save_model=‘data/trained’, src_vocab_size=28699, tgt_vocab_size=52799)
cuda device count: 2
[ Epoch 0 ]

(Training) : 79%|▊| 7034/8883 [56:34<15:09, 2.03it/s]

learning rate update code:

    def _get_lr_scale(self):
        return np.min([np.power(self.n_current_steps, -0.5), np.power(self.n_warmup_steps, -1.5) * self.n_current_steps])

    def _update_learning_rate(self):
        ''' Learning rate scheduling per step '''
        self.n_current_steps += 1
        lr = self.init_lr * self._get_lr_scale()
        #
        for param_group in self._optimizer.param_groups:
            param_group['lr'] = lr

I use transformer model to train machine traslation, but in first epoch i meet the above strange phenomenon/problem.

Why does loss increase?
2.Why does accuracy decrease?

Abhilash_Srivastava · September 21, 2019, 1:39am

It is not necessary that your loss should decrease for every batch within each epoch (it can go up for different batches), but it should decrease across different epochs.
If your loss is not decreasing across different epochs, learning rate could be a problem

DoubtWang · September 21, 2019, 1:26pm

yeah, the learning rate is a problem.
I have set lr to min(self.init_lr * self._get_lr_scale(), 0.00001 ). this phenomenon disappear