Hi all,
I am trying to reproduce Glorot and Bengio’s work on “Understanding the difficulty of training deep ffnn” using PyTorch and extend the same analysis to more scenarios and more metrics.
I have few questions regarding the two kinds of gradients that are analyzed and how to extract them in PyTorch:
- The weight gradients is the dL/dWi
- Is the back-propagated error is dL/dXi?
I think they mentioned it the opposite order they show the equations (13 and 14) and could lead to the little misunderstanding I am having.
Is this the right way to extract the Weight gradients?
self.layer_i.weight.data.numpy()
How to extract the back-propagated error per-layer? To then do the histograms
Thanks!