Is there an alternative to do batched matrix multiplication on Quantized Tensors?

jerryzh168 · February 5, 2021, 11:57pm

currently we only support quint8 for activations and qint8 for weight I think.

Currently we do not have plans for supporting bmm, one workaround is to put DeQuantStub and QuantStub around bmm op to skip quantizing it.