Mysterious `trying to backward through the graph a second time' issue

Hi,

It is a bit hard to say given that we cannot really run the code.
Most likely something that you do when creating the net is a differentiable op and gets reused every time you call train_batch().

A simple way to check this is to use torchviz.
You can print the graph of the supertagging_loss for the first batch. Then print the graph for the second batch. If the second one contains the first, then you have something that links one iteration to the next.

Otherwise, you want to save in a global variable supertagging_loss from the first batch and then at the second one do make_dot([first_batch_supertagging_loss, second_batch_supertagging_loss]) and make sure you get two disjoint graphs except for the parameters (the blue ovals) that are used in both. If you have other nodes that are shared, that means that some computations are re-used from one iteration to the next.