I build a pytorch model based on conv1d. I gone through quantization and implemented some cases as well but all those are working on conv2d, bn,relu but In my case, my model is built on conv1d and PReLU. Does this quatization valid for these network layers? Because when I did quantization only the layers which are included in mapping is only quantized. Let me show you those layers for which quantization is valid(i.e which are included in mapping)
Please find the list of modules it supports:(according to source code in went through)
(Actual layer : quantized layer)
nn.Linear: nnq.Linear,
nn.ReLU: nnq.ReLU,
nn.ReLU6: nnq.ReLU6,
nn.Conv2d: nnq.Conv2d,
nn.Conv3d: nnq.Conv3d,
nn.BatchNorm2d: nnq.BatchNorm2d,
nn.BatchNorm3d: nnq.BatchNorm3d,
QuantStub: nnq.Quantize,
DeQuantStub: nnq.DeQuantize,
We are in the process of implementing the Conv1d module and ConvReLU1d fused module. The PR list is here https://github.com/pytorch/pytorch/pull/38438. Feel free to try out your model with the changes in this PR. The quantization flow should convert it.
We donāt currently support fusion with PReLU and LayerNorm, so they will have to be executed separately.
Fusing is optional one in quantization if Iām not wrong. We need our modules to be quantized i.e., each layer we implemented, in order to get our quantized parameters to pass through it. Only the quantized model will work if all the layers were quantized is it right? Or else we need to dequantize the parameters again before it passes through not quantized layer is it so?
Hey, you gave me that PR link above in the last comment for support of quantized conv1d. I decided to give a try with that but the torch module is not importing and it showing the error ātorch.versionā is not there. So i copied āversion.pyā from my earlier version after this too its not importing torch throwing another error ātorch._c import default_generatorsā failed to import.
May I know that the changes in that PR is applicable for torch CPU version or not?
Only the quantized model will work if all the layers were quantized is it right? Or else we need to dequantize the parameters again before it passes through not quantized layer is it so?
I have done Quantizing my model, now I tried to save it with ājit.save(jit.script(model))ā but it pops the error as āāaten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[]): could not match type tensor to list[t] in argument ālā: cannot match list[t] to tensor.āā very similar to this 2 more pops also arised. I googled about this error and in some discussions I found that it is the error regarding slicing(: , : , : ) operator says that ājit.script will not support scripting for directly used slicing operatorā. Is it the error actually pointed to that? Because I too used slicing operator in the middle of my model and error throws at the same line.
This is one thing and now I go with another alternative to save quantized model i.e., with state_dict(). Iām able to save model with this but when I want to perform inference I have to initialize the model with the params which I have saved earlier with state_dict(). Now the params are quantized one but our model is defined for float. So error popped up as āexception occured : (ācopying from quantized tensor to non-quantized tensor is not allowed, please use dequantize to get a float tensor from a quantized tensorāā. This is the issue that I canāt save my model with jit and if I do so with state_dict() here I canāt initialize my model to go with inference.
Could you give a small repro for the error with aten::slice not working with jit?
Regarding loading the quantized state_dict - you will first have to convert the model to quantized model before loading the state dict as well. You can call prepare and convert APIs to do this (no need to calibrate). This way the model state dict will match the saved quantized state dict and you should be able to load it.
This has done successfully because JIT might accepts only direct representation of Int for slicing as āw[:,:,:-16]ā where as I initially represented as āw = w[:,:,:-2*2**3]ā. So tried a chance and it worked.
Something I would like to say is JIT is not able to find attribute ā.new_tensorā where as ā.clone().detach()ā is identified
I have some doubt about operations on quantized variables. Here Iām quoting it
z = self.dequant_tcnet(z)
w = self.dequant_tcnet(w)
v = self.dequant_tcnet(v)
x = self.dequant_tcnet(x)
z = z + w
x = x + v
x = self.quant_tcnet2(x)
Here inorder to perform these 2 operations z = z + w; x = x + v I need to dequantize the variables involved in that operations and then perform those operations. Can we able to perform those operations without dequantizing i.e., as quantized variables because if I do run this without dequantizing variables āRuntimeError: Could not run āaten::add.Tensorā with arguments from the āQuantizedCPUā backend. āaten::add.Tensorā is only available for these backends: [CPU, MkldnnCPU, SparseCPU, Autograd, Profile].ā error is interrupting.
Can we perform āadditionā on 2 quantized tensor variables? without dequantizing them in any other alternative.
z = self.dequant_tcnet(z)
w = self.dequant_tcnet(w)
v = self.dequant_tcnet(v)
x = self.dequant_tcnet(x)
z = z + w
x = x + v
x = self.quant_tcnet2(x)
Can we perform āadditionā on 2 quantized tensor variables? without dequantizing them in any other alternative.
For this i tried QFunctional & FloatFunctional but output is not up to the mark by using this. where as placing quantstubs works good.
I have some concern here. My float model which takes 0.2 - 0.3 second (~300ms) to process single input whereas after i quantizing my model with Int8 Quantization the time taken is increased from 0.2-0.3(float precision) to 0.4 to 0.5(Int8).
Here i show you the exact float model block and quantized model block
***** float block ******
*****Round 1********
y = self.conv1x12(x)
y = self.prelu2(y)
y = self.norm2(y)
w = self.depthwise_conv12(y)
#w = w[:,:,:-2*2**2]
w = w[:,:,:-8]
y = self.depthwise_conv2(y)
y = y[:,:,:-8]
y = self.prelu22(y)
y = self.norm22(y)
v = self.pointwise_conv2(y)
z = z + w
x = x + v
This is float model block. This block/computation will be repeated 13 more times (total 14 blocks). This is taking 0.2 - 0.3 seconds
Quantized block round 1**
y = self.conv1x12(x)
y = self.prelu2(y)
y = self.norm2(y)
w = self.depthwise_conv12(y)
w = w[:,:,:-8]
y = self.depthwise_conv2(y)
y = y[:,:,:-8]
y = self.prelu22(y)
y = self.norm22(y)
v = self.pointwise_conv2(y)
w = self.dequant_tcnet(w)
z = self.dequant_tcnet(z)
v = self.dequant_tcnet(v)
x = self.dequant_tcnet(x)
z = z + w
x = x + v
x = self.quant_tcnet3(x)
This is quantized block model where is placed quantstubs for those arthematic operations & remaining all layers are quantized. This quantized model is taking 0.4 - 0.5 seconds
So after quantizing my model, the size of model is optimized but computation time is not optimized. Could you tell me is there any flaw? I crosschecked and the output is also good but computation is not reduced
For add you could use torch.nn.FloatFunctional, the extra dequant and quant ops in the network could be slowing things down.
Regarding performance you can try running the torch.autograd.profiler on your model for some iterations to see which ops take up most time. It will give you an op level breakdown with runtime so you can compare float vs quantized model.
Yah, I did that replacement of quantstubs/dequantstubs with floatfunctional. Iāll drop it here
> y = self.conv1x11(x)
y = self.prelu1(y)
y = self.norm1(y)
w = self.depthwise_conv11(y)
#w = w[:,:,:-2*2**1]
w = w[:,:,:-4]
y = self.depthwise_conv1(y)
#y = y[:,:,:-2*2**1]
y = y[:,:,:-4]
y = self.prelu11(y)
y = self.norm11(y)
v = self.pointwise_conv1(y)
#w = self.pointwise_conv_skp1(y)
#z = self.dequant_tcnet(z)
#w = self.dequant_tcnet(w)
#v = self.dequant_tcnet(v)
#x = self.dequant_tcnet(x)
#z = z + w
#x = x + v
z = self.Qf_s.add(z,w)
x = self.Qf_s.add(x,v)
#x = self.quant_tcnet2(x)
#z = self.quant(z)
as you can see now I removed stubs and using FloatFunctional but yet usage is not reduced
We are currently implementing the operator using quantized::conv2d under the hood after unsqueezing the activation and weight tensors. That might be proving sub-optimal for certain input shapes.
Could you give us details about the input and weight dimensions and the parameters (kernel, stride, pad, dilation) to conv1d in your case?