Hidden units saturate in a seq2seq model in PyTorch

Ok, it seems that other people are interested in this.

The problem is related to this. In 0.2.0 post3 (August) version of PyTorch, the document says nothing about the dim parameter of log_softmax and what will it behave if dim==None. If anyone spends lots of time on things like this and gets no luck, I suggest you update to the master or use tensorflow.

1 Like