I have been looking for a PyTorch, TensorFlow or Numpy Implementation of Winner Take it All ( WTA ) Softmax. I have been reading that it helps with extreme learning with millions of class labels. A case that happens quite often in production. Moreover, I found some benchmarks explaining how it excels Hierarchical Softmax. Besides its simplicity to apply for vision problems.
Can any one help?