Inference error after int8 quantization with pytorch

when I inferenced my model with int8 quantization, I meet the following error: what should I do to solve it?

NotImplementedError: Could not run ‘quantized::conv2d.new’ with arguments from the ‘CPU’ backend. This could be because the operator doesn’t exist for this backend, or was omitted during the selective/custom build process (if using custom build).

Can you provide the model code which you are trying to quantize. FYI quantization is not implemented yet for CUDA

At the moment PyTorch doesn’t provide quantized operator implementations on CUDA - this is the direction for future work. Move the model to CPU in order to test the quantized functionality.

I inference the model in cpu mode. I changed to use the graph mode to quantify the model, but the erros generated:

isn’t the if branch supported?
def forward(self, x):
if x.ndim == 5:
return self.forward_time_series(x)
else:
return self.forward_single_frame(x)

@FengMu1995 the if branch is definitely supported

Here is an example. Without looking at the entire code it would be difficult to understand the issue. I feel that there are elements that you are switching between GPU and CPU , and since quantization does not work on the GPU it throws an error

class random_model(nn.Module):
    def __init__(self):
        super(random_model, self).__init__()
        self.model1 = nn.Sequential(
            nn.Linear(100, 10), 
            nn.BatchNorm1d(10),
            nn.ReLU(),
            nn.Linear(10, 4), 
            nn.BatchNorm1d(4),
            nn.ReLU(),
            nn.Linear(4, 1),
        )
        self.model2 = nn.Sequential(
            nn.Linear(100, 10), 
            nn.BatchNorm1d(10),
            nn.ReLU(),
            nn.Linear(10, 1),
        )
    def forward(self, x, flag_condition=True):
        if flag_condition==True:
            return self.model1(x)
        else:
            return self.model2(x)
        
X = torch.rand(100, 100)
y = torch.randint(2,(100,)).type(torch.FloatTensor)
model = random_model()
criterion = nn.MSELoss()
num_epochs = 100
learning_rate = 1e-2
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

for cur_epoch in range(num_epochs):
    model.zero_grad()
    if cur_epoch % 2 ==0:
        output = model(X, flag_condition=True)
    else:
        output = model(X, flag_condition=False)
    loss = criterion(y, output)
    loss.backward()
    optimizer.step()
    print("Cur Epoch {0} loss is {1}".format(cur_epoch, loss.item()))

please take a look at Quantization — PyTorch 1.10.0 documentation

the “if branch” was not used to switch the gpu and cpu, It is only for judging the input dismension
source code as following:
if x.ndim == 5:
return self.forward_time_series(x)
else:
return self.forward_single_frame(x)

errors as following:

File “/home/wudi/Software/yes/envs/torch1.9/lib/python3.8/site-packages/torch/quantization/quantize_jit.py”, line 54, in _prepare_jit
model_c = torch._C._jit_pass_insert_observers(model._c,
RuntimeError: branches for if should return values that are observed consistently, if node:%5 : Tensor[] = prim::If(%4) # /data00/peterlin/RVM/model/mobilenetv3.py:69:8
block0():
%6 : Tensor[] = prim::CallMethod[name=“forward_time_series”](%self.1, %x.1) # /data00/peterlin/RVM/model/mobilenetv3.py:70:19
→ (%6)
block1():
%7 : Tensor[] = prim::CallMethod[name=“forward_single_frame”](%self.1, %x.1) # /data00/peterlin/RVM/model/mobilenetv3.py:72:19
→ (%7)

the problem has been solved

@FengMu1995 aah okay. Let me try that as well. Sounds interesting

Please share the solution :slight_smile:

I have met the same problem as you , I guess that’s because your original model is built By conv2d , but your quantized model is built By quantized::conv2d,when you try to restore a quantized model for disk , it cannot run quantized::conv2d.new on conv2d.

it is different, this “if branch” has not been solved

can you try using eager mode to quantize the if branch? also can you describe the problem in a bit more details, in terms of what are you trying to achieve, and what is the output