Trying to generate int8 model instead of quantizing fp32 model. Is there any way to generate int8 model directly ?
Not really, all the quantised 8bits models are actually fake quantised. Then the backend internally uses the integer representation of the model.
If you would generate a quantised model directly, you would have to deal with overflow. Since the result of a matrix vector product between two int8 tensors will be a int8 tensor. Whilst to have accurate results (up to a certain dimensionality of the matrix), you would need to store the result in a int32 tensor before quantising (the previous int8 result are the 8bits starting from the LSB of the int32 result).