Correct way to do backpropagation through time?

Check out hooks. If you want to inspect an gradient, you can register a backwards_hook, and drop the values into a print statement or tensorboard.

eg, in the below code I drop a hook to monitor the values passing through a softmax functiion. (later I compute the entropy and pump it into tensorboard).

        def monitorAttention(self, input, output):
            if writer.global_step % 10 == 0:
                monitors.monitorSoftmax(self, input, output, ' input ', writer, dim=1)
        self.softmax.register_forward_hook(monitorAttention)