I am trying to debug some confusing problems with my modified sequence to sequence network. I have figured out that there is some kind of issue with the gradients becoming NaN, so I am trying to somehow store the gradients so I can look at the values at all the intermediate steps using something like pdb.
If I do not add a backward hook, my program crashes with the error I have been expecting (though it isn’t very informative, it just tells me something about a “device side assert error” from cuda). If I add any backward hook at all, I get a Segmentation Fault instead, before I get to my error checking.
For example, if I add this to my decoder class:
def save_grad(self):
def hook(grad):
pass
return hook
and add these lines to very end of my decoder’s forward pass:
grads=[rnn_output, hidden, attn_weights, context, concat_output, output, mem_vec]
names = ["rnn_output", "hidden", "attn_weights", "context", "concat_output", "output", "mem_vec"]
for var,name in zip(grads,names):
var.register_hook(self.save_grad(name))
(where rnn_output, hidden, etc… are various intermediate variables in the computation) then my program quickly terminates with the line “Segmentation Fault”. It produces no other traceback whatsoever.
I also tried this with pass
replaced by a print statement, storing the values in a dictionary, etc… and all had the same problem. The problem also appeared when save_grad was a global function defined outside of my model class.