WTA Softmax for Extreme Learning with more than a thousand or millions of class labels

Mustafa_Qamaruddin · May 4, 2018, 11:10pm

I have been looking for a PyTorch, TensorFlow or Numpy Implementation of Winner Take it All ( WTA ) Softmax. I have been reading that it helps with extreme learning with millions of class labels. A case that happens quite often in production. Moreover, I found some benchmarks explaining how it excels Hierarchical Softmax. Besides its simplicity to apply for vision problems.

Can any one help?