Hi,
i’m working on network quantization using the pytorch 2 export and AO and i would like to apply layer specific quantization strategies to my model to maximize performance. In AO when writing your quantizer, you can define how to annotate (or not to annotate) a module/function based on its type but i don’ see any way to change quantization strategy based on the position of the module/function in the network. Let say i have a U-NET architecture and i want to avoid quantizing the first layer of the encoder, how i can achieve this ?
thanks