How to deal with large vocabulary in PyTorch?

tonyyuango · May 10, 2017, 6:25am

I want to implement a RNN-based translation model, and the size of vocabulary is more than 100k. The model training takes too much time probably due to the computation of softmax at the output layer.

I found noise contrastive estimation (NCE) should be a good solution (Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.), but PyTorch hasn’t provided the NCE loss function yet. Is there a way to address it?

smth · May 13, 2017, 4:25am

See: https://github.com/pytorch/pytorch/issues/1362

tonyyuango · May 15, 2017, 5:18am

Thank you for sharing that thread. There doesn’t seems to be a good solution at this moment…