Computing per sample gradient w.r.t. last layer's parameters

Expanding upon what @anantguptadbl has stated you can get per-sample gradients via the use of registering forward_pre hooks and full_backward hooks. You can read more here