Hi. I am toying with network optimizers and would like to modify the backprop process in Pytorch.
For example, I would like to only save int8 versions of the context, and only calculate int8 gradients, whatever the nn layers or operations.
Is that something that can be done?
( Some things seem to be available to come on top of the backprop but I want to override it, as speed is important and hooking over the habitual calculations would just make no sense.)