Computing per sample gradient w.r.t. last layer's parameters

AlphaBetaGamma96 · February 15, 2022, 1:42pm

Expanding upon what @anantguptadbl has stated you can get per-sample gradients via the use of registering forward_pre hooks and full_backward hooks. You can read more here