Adaptive Softmax

Ujan_Deb · January 3, 2018, 6:03pm

Are there any plans to include an adaptive softmax function described in the paper “Efficient softmax approximation for GPUs” in Pytorch? http://arxiv.org/abs/1609.04309
Github repo : https://github.com/facebookresearch/adaptive-softmax

vahrau · January 4, 2018, 9:34am

Hi, somebody has already implemented adaptive softmax in pytorch, MIT license, see: https://github.com/rosinality/adaptive-softmax-pytorch

There’s a bug there, see comment here: https://gist.github.com/rosinality/0cdd8d6adb8463961f50bd1845faddf8#gistcomment-2272578

I’m currently running a language model experiment with this, and it seems to be working, but don’t know enough to fully ascertain the quality of this implementation.

Ujan_Deb · January 4, 2018, 6:54pm

Thanks. Yes its the only implementation I found other than the lua package released by the authors. I was looking for a Pytorch module which I could use off the shelf without getting into too much detail but it turns out I’ll have to. Could you comment on your perplexity score, vocabulary size and the relative speedup? Just curious if its worth the trouble.

vahrau · January 4, 2018, 7:22pm

Sure, I can get back on the relative speeds tomorrow for both training and inference. The model is not for English on any public dataset, though. Vocabulary size is 800k.
Speed is also the relevant factor for us, but integrating it from the example above in an existing Pytorch model took less than an hour, so dev costs are not that high.

davidwbressler · December 12, 2018, 6:23am

For anyone interested, I wrote a blog post explaining the adaptive softmax, with a Pytorch implementation: https://towardsdatascience.com/speed-up-your-deep-learning-language-model-up-to-1000-with-the-adaptive-softmax-part-1-e7cc1f89fcc9