GPU support for quantization

Haxxardoux · May 29, 2020, 12:01am

Will quantization be supported for GPUs anytime soon? I have a project where evaluation speed is a very major concern and would love to use quantization to speed it up.

I see the CPU quantization tutorial on the docs was written about 6 months ago, so I am really just curious if this is on the developers’ radar at all and if we can expect this eventually or in the near future.

jerryzh168 · May 29, 2020, 5:53pm

We will be considering the GPU option in the second half of the year, but I think it probably won’t be a high priority item.

Haxxardoux · June 1, 2020, 4:42pm

is there a particular reason it is not a high priority? i am still a student but was under the impression that inference with large models was typically done on GPUs, and quantization would be very beneficial

jerryzh168 · June 1, 2020, 5:12pm

this depends on our internal customers, we haven’t decided yet.

KevinPD66 · September 1, 2020, 12:39pm

I’m not sure if there is a voting process but we (as a company) are using pytorch in our production process and inference speed of our custom BERT model is critical for us. In my opinion to get more adoption of pytorch in production and commercial applications inference speed is going to be critical and this feature would be a huge step forward for that. My two cents.

berserkr · October 2, 2020, 9:23am

I am looking to contribute some work in the area of quantization for multiple architectures including fpgas and gpus. Any suggested guides on how to get started with contributing to pytorch? Cheers!