Debugging why network parameters don't change


I have a somewhat weird network architecture, where I use two networks (one output is the second weights). I can run it and do .backward() and optimizer.step() without any errors. But when I look at the first network parameters between batches they don’t change.

I have absolutely no idea what I’m doing wrong and would like to just look at the backward’s computation graph and see where it stops. I’ve tried pytorchviz but from what I understood it is no longer maintained.

Does anyone know how I can debug this problem?

between backward() and step(), you can check .grad of all named_parameters() to locate the problematic segment where gradients disappear

or, add hooks (either to tensors or modules) to do a similar analysis on non-parameter tensors (i.e. intermediate outputs) - this requires more effort, but you’ll be able to narrow it down to individual operations, if necessary

though, it is likely that your “two network” link is the problem point.