Hi, all
I implemented Dyer’s stack-lstm dependency parser (with in-house modification) with Pytorch 0.3.0post4 and achieved result (UAS, LAS) that very close to the paper.
Recently, I updated pytorch to 0.4.1 and followed the migration guide it provided.
However, I got following:
- UAS dropped almost 7%.
- Training time for each epoch is reduced to 50%. (x2 faster)
It’s so weird and checked my code many times for almost 4 days but found nothing suspicious.
I can’t open my code so far cause it’s shared with other people, but I want to write down details of my code here as much as possible.
Model has following modules & features
- consists of four 2-layer LSTMs. (no truncate)
- in-house Tree-LSTMs.
- layer normalization between layers
- dropout for all non-recurrent modules
- pre-trained embeddings
- uniform initialization (e.g. param.data.uniform_(-0.1, 0.1)) except for layer_norms
- adadelta optimizer with weight decay
- a lot of split, cat operations
- KLDiv Loss with label smoothing
- computational graph is very dynamic because it’s running batch on trees
What I have done with migration to 0.4.1 is:
- Erase all Variables, Volatile expressions & flags.
- Change loss.data[0] to loss.items()
- Change loss(average=False) to loss(reduction=“sum”)
- Change pre-trained embedding loading to nn.Embedding.from_pretrained(torch.load(pre), freeze=True)
My first guess was some disconnections occurred in computational graph, cause training time dramatically reduced. But I can’t find any other signs that happened.
Anyone has clues?
Many thanks.