I’m asking because I encountered an issue. After applying a sigmoid function to a feature map, I tried to perform 16-bit asymmetric quantization based on the output’s min/max values. However, the calculated zero-point was -55083, which is a value that exceeds the 16-bit integer range. This situation made me question whether quantizing after sigmoid and SiLU is the correct approach.
I recall a paper from Qualcomm which explained that for activations like ReLU, you can simply apply the ReLU function directly to the already requantized output of the previous layer (e.g., a convolution).
So, my main question is: Following a convolution and its subsequent requantization, is there a method to compute non-linear activation functions like sigmoid or SiLU directly on the quantized tensor, thereby avoiding the typical process of dequantization → activation → requantization?