Mini Batch training for Word Embeddings example


I am trying to modify the example shown here to take a mini-batch as input instead of looping through one n-gram pair at a time. While I tried doing this by modifying the layer input sizes, the network convergence slows down dramatically and essentially stalls. Is there a right way to do mini-batch training here using PyTorch?


This is the exact issue I am having. Did you ever get this figured out?