Given a quantized model (PTQ or QAT), I want to convert it into a model whose parameters representation is float32. Because I want to use GPU. How cloud I achieve it?
you mean “dequantize” a model? interesting, I don’t think we support that. Why don’t you use the original floating point model in this case?
Thank you for your suggestion. Actually, I want to perform experiments on quantized model. For some reason, the quantized model does not support GPU, which makes the experiment pretty slow.
Thank you very much.