Word language model result very low perplexity (3.70) with convolution layer

I custom PyTorch word language model (my code https://github.com/ttpro1995/custom_word_language_model)

I add convolution layer before LSTM. (see file class ModelWrapper in https://github.com/ttpro1995/custom_word_language_model/blob/master/model/model_wrapper.py)

In main.py, I freeze the encoder (embedding) and train convolution, lstm, decoder

    for p in model.conv_module.parameters():
        p.data.add_(-lr, p.grad.data)

    for p in model.rnn.parameters():
        p.data.add_(-lr, p.grad.data)

    for p in model.decoder.parameters():
        p.data.add_(-lr, p.grad.data)

I got test ppl 3.70, which is too small.

Run command
python main.py --cuda --emsize 300 --nhid 168 --dropout 0.5 --epochs 40 --noglove

log file https://gist.github.com/e7644ad05836b6a147cb243e3764ff1f

Please tell me if anything go wrong.

1 Like