Given a quantized model (PTQ or QAT), I want to convert it into a model whose parameters representation is float32. Because I want to use GPU. How cloud I achieve it?
you mean “dequantize” a model? interesting, I don’t think we support that. Why don’t you use the original floating point model in this case?
Thank you for your suggestion. Actually, I want to perform experiments on quantized model. For some reason, the quantized model does not support GPU, which makes the experiment pretty slow.
maybe you can try [quantization] Frequently Asked Questions to see if this works for you or not
Thank you very much.