Does Dynamic Quantization support GPU?

MattHatter · April 23, 2021, 10:33pm

Hello,
I’ve tried doing dynamic quantization on the XLNet model during inference, and I got this error message:

RuntimeError: Could not run 'quantized::linear_dynamic' with arguments from the 'CUDA' backend. 'quantized::linear_dynamic' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

This leads me to believe dynamic quantization doesn’t support CUDA, and if so, do you guys plan to have CUDA support for quantization for both training and inference? I couldn’t find any issues relating to this on Github. Thanks!

jerryzh168 · April 27, 2021, 12:45am

yeah it is not supported on CUDA, quantized::linear_dynamic is only supported in CPU. We do not have immediate plans to support CUDA but we plan to publish a doc for custom backends which will make the extension easier.

yan_kui · June 30, 2021, 7:18am

hi, where could we get the doc?

jerryzh168 · July 30, 2021, 11:19pm

will think about post one in OSS, please keep an eye out for that in github issues page, we are currently working on enabling CUDA path through TensorRT as well, had a prototype here: [not4land] Test PT Quant + TRT path by jerryzh168 · Pull Request #60589 · pytorch/pytorch · GitHub

I can share the doc early with you if you message me your email. but we may make some modifications before publishing in oss

MattHatter · August 13, 2021, 3:34am

Looking forward to this! Thanks @jerryzh168 for your support in this matter