Training with custom, quantized datatype

Hey!

  1. Adding a new native dtype is pretty tricky and not really possible today I’m afraid. You can use a custom subclass though to do something like that: What (and Why) is __torch_dispatch__? - frontend API - PyTorch Dev Discussions and a collection of examples GitHub - albanD/subclass_zoo
  2. The constraint in the autograd engine is mostly because cross dtype tensor/grad is not something that is used today and that is a common error when implementing backwards. So it is safer to have this check. We are open to lifting this check for particular Tensor types (in particular subclasses that I mentioned above).