Inference with float16

ytesfai · February 14, 2025, 9:52pm

Hi,

I have a Linear/GRU NN implementation that will eventually be in an embedded system. As such I need the inference to be done with float16. I tried performing the test with model.half() but I got the following error:

*** RuntimeError: “compute_indices_weights_linear” not implemented for ‘Half’

I read some discussions in this regard pointing out that float16() is not supported in a CPU as it wouldn’t much speedup. However in my case I want to test with float16 to make sure there is no performance degradation when I port my network to embedded environment with all float16. Is there any way for me to do this other than moving to a CUDA for my inference test?

Thanks
Yohannes

ptrblck · February 14, 2025, 10:43pm

If you are indeed seeing this issue on the CPU, then your concern might be valid and you would need to either use a GPU for float16 support or you could try to use bfloat16 instead on the CPU or GPU.

ytesfai · February 18, 2025, 2:43pm

@ptrblck thanks for the recommendations. I tried bfloat16, it also had it’s own issues. I went with testing on a GPU and that works.

Not sure what the protocol is for requesting features but for anyone considering embedded NN inference, smaller data size like float16 I think is very important so it would be nice if it was fully supported.