How to deal with large vocabulary in PyTorch?

I want to implement a RNN-based translation model, and the size of vocabulary is more than 100k. The model training takes too much time probably due to the computation of softmax at the output layer.

I found noise contrastive estimation (NCE) should be a good solution (Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.), but PyTorch hasn’t provided the NCE loss function yet. Is there a way to address it?


Thank you for sharing that thread. There doesn’t seems to be a good solution at this moment…