Customize backprop process

Hi. I am toying with network optimizers and would like to modify the backprop process in Pytorch.

For example, I would like to only save int8 versions of the context, and only calculate int8 gradients, whatever the nn layers or operations.

Is that something that can be done?

( Some things seem to be available to come on top of the backprop but I want to override it, as speed is important and hooking over the habitual calculations would just make no sense.)

There is a context manager that allows you to interpose on the saving of the context needed for backward.

https://pytorch.org/docs/stable/autograd.html#torch.autograd.graph.saved_tensors_hooks

Calculating the gradients only in int8 is a little trickier, however since autograd require the output tensors during forward and gradients computed wrt those outputs to have the same dtype.