Torchtext BOW SGD gradient explosion

I am trying to reimplement the torch text example

The example is simple and clear, however, it doesn’t use the standard torchtext Iterators instead it has it’s own dataloader and batch generator (and it seems for reason).

I tried to reimplement the code with the standard Interator. But this lead to gradient explosion. Then I tried to use BucketIterator that groups similar length string together, and at least, it didn’t explode but didn’t learn too.

Obviously, padding is useless for this architecture, but I didn’t find a way to disable padding in torchtext. On the other hard, why padding works so bad with SGD? When I replace SGD with Adam everything went smoothly.

A lot of questions and no answers.