Per-sample gradient, should we design each layer differently?

Hi,

The main roadblock to introduce this in the core library is that even though this works for some layers, it cannot be generalized easily to all operations supported by the autograd.

If you use these operations though, this package by @Yaroslav_Bulatov is what I would use.

1 Like