for CPU int4 weight only quant, you can check out this: Quantized LLM inference vs quantized matrix multiplication speed in CPU - #3 by jerryzh168
for CPU int4 weight only quant, you can check out this: Quantized LLM inference vs quantized matrix multiplication speed in CPU - #3 by jerryzh168