Correctly changing precision in DLRM

Rishabh_Jain · February 20, 2024, 6:08pm

Hello,

I would like to change the precision used in DLRM inference. I observed that the default datatype used in the original code is “float32”, in the create_emb() method. Is changing the dtype to any of the supported values enough to correctly change the precision?

I believe the generated embedding table would be directly used in the apply_emb() function.

Thanks for the help.
Rishabh Jain

HDCharles · February 20, 2024, 6:35pm

for some changes, like fp32 to fp16 or bf16 that is sufficient if those ops support those dtypes, for actual quanitzation you need to quantize your model which takes more effort, see: Quantization — PyTorch 2.2 documentation for more details