Bug or am I missing something?

Please check out the following short program that reproduces the bug I was facing.

Imagine in each module that self.params is just some transition matrix. The goal is to sum up the scores of each transition, where params[i][j] is the score of transitioning to i from j.

Both WorkingModule and BuggyModule have a forward() function that correctly computes this score.

WorkingModule does what you would expect. If you check its gradient after calling backward(), you will see 1’s in the places where there was a transition, and 0’s elsewhere. BuggyModule though, doesn’t backpropagate to self.params! The difference is that in this case, the sequence was wrapped in an autograd.Variable, and the transition indices are accessed with .data.

I understand the dangers of .data and how it might cut off your backprop, but how is it cutting off the backprop from score to params? The only way sequence is ever involved is just providing an index. In principle, score should not be cut off from params in the computation graph unless I am missing something. In addition, I think that sequence[i].data should be evaluated before being passed to the indexing function, so I am not sure how there is any difference at all as far as constructing the “score” computation graph is concerned.

It’s a problem that we’ve fixed recently. The second module will raise an error now. We decided to roll back some of the support for LongTensor indexing, because it wasn’t consistent with numpy and had these issues in autograd. You first module indexes the Variable with two int objects, the second one uses two torch.LongTensors (we now only allow using a single torch.LongTensor as the last index).

Updating to 0.1.10 should fix it.