Fast implementation of Glove

I implemented Glove with Pytorch :

I have a focus on speed on GPU

  • No big data transfert between the CPU and the GPU during training
  • All embeddings are stored in a matrix
  • Large batch size

I train on the newsgroup dataset in 15s with a GTX 1080. The original implementation,, needs multiple hours.

Feedbacks welcome! :slight_smile:


This is super cool, thanks for sharing.


I am trying to validate my install. I have a GTX 1080, CUDA 8.0 + CuDNN 5.1.
With my setup, your script took 18 secs/epoch.
The 15 secs is a per epoch time or the time for the 10 epochs ?