I’m currently trying to apply static quantization to several more or less modern architectures in vision. It all went reasonably smoothly for efficientnetv2. However, I hit a brick wall with convnext, getting results like these:
``
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
I’m running it on an i7-10875H in a single core mode, because that’s our target mode. Latest pytorch version I used was 1.12.0.dev20220404 nighly.
As far as I’ve seen, depthwise separable conv2d slows down significantly after quantization disregard the kernel size. However, convnext utilizes 7x7 convolutions that shoot inference time through the roof. Am I cooking it wrong? Could anybody please point me to how I can fix that?
Could you share the shapes especially the padding? Is the padding size 3 (so called same padding that results in the same output spatial dim as input spatial dim)? This can help us to optimize.
Hi, bro! How do you quantify depthwise separable convolution with PyTorch? torch.quantize_dynamic does not support quantizing nn.Conv2d , and when using torch.quantization.convert , it may prompt an error stating ‘Quantized cuDNN Conv2d is currently limited to groups = 1’. I have no idea how to solve it. Can you provide some help?