Does pytorch provide mixed precision integer operations? For example, if I have 2 int8 tensors, can I take the dot product into an int32 without overflowing? Can I do matrix multiplication into int32 where the necessary partial products are kept at proper precision to avoid overflow?
Or would I have to write these kernels from scratch at the C++ level?