High Quantization-Aware training memory consumption

Vasiliy_Kuznetsov · October 27, 2020, 11:33pm

Hi @Georgios_Georgiadis, one known problem is that fake_quantize modules are currently implemented as additional nodes in the computation graph, so their additional outputs (the fake_quantized versions weights and activations) contribute to the memory overhead during training. We have plans to improve the memory overhead in the future by adding fused fake_quant kernels for common layers such as conv and linear.