`Expected self.scalar_type() == ScalarType::Float to be true, but got false.` when doing quantization aware training?

feiyuhuahuo · January 11, 2021, 10:01am

I’m trying to do QAT for yolov5, but I got the following error:
RuntimeError: Expected self.scalar_type() == ScalarType::Float to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
This is the total log:

Traceback (most recent call last):
  File "train.py", line 464, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 269, in train
    pred = model(imgs)  # forward
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/feiyu/yolov5/models/yolo.py", line 132, in forward
    x = self.forward_once(x, profile)  # single-scale inference, train
  File "/home/feiyu/yolov5/models/yolo.py", line 150, in forward_once
    x = m(x)  # run
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/feiyu/yolov5/models/common.py", line 114, in forward
    return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/feiyu/yolov5/models/common.py", line 48, in fuseforward
    return self.act(self.conv(x))
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 731, in _call_impl
    hook_result = hook(self, input, result)
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/quantization/quantize.py", line 82, in _observer_forward_hook
    return self.activation_post_process(output)
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/feiyu/anaconda3/lib/python3.7/site-packages/torch/quantization/fake_quantize.py", line 104, in forward
    self.quant_max)
RuntimeError: Expected self.scalar_type() == ScalarType::Float to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

And here is the code where error occurs:

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.Hardswish() if act else nn.Identity()

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))

I tried use nn.ReLU to replace nn.Hardswish, but it didn’t work. And I checked the input x dtype, which is always torch.float32.

feiyuhuahuo · January 12, 2021, 2:47am

I directly add X = X.float() above where the error appears in torch.quantization.fake_quantize.py. Because I found the dtype of X is torch.float16. Surprisingly, it works, the model can be successfully trained.

        if self.fake_quant_enabled[0] == 1:
            if self.qscheme == torch.per_channel_symmetric or self.qscheme == torch.per_channel_affine:
                X = torch.fake_quantize_per_channel_affine(X, self.scale, self.zero_point,
                                                           self.ch_axis, self.quant_min, self.quant_max)
            else:
                import pdb
                # try:
                # I modified the code here. $$$$$$$$$$$$$$$$$
                X= X.float()
                X = torch.fake_quantize_per_tensor_affine(X, float(self.scale),
                                                          int(self.zero_point), self.quant_min,
                                                          self.quant_max)
                # except:
                #     pdb.set_trace()
        return X

Is this normal?
I haven’t completed training now, thus I can’t tell whether I can get a useful model. My PyTorch version is 1.7.0

feiyuhuahuo · January 14, 2021, 7:09am

My bad. I didn’t notice that the forward progress is wrapped by torch.cuda.amp.autocast. It’s OK now.

loveltyoic · May 19, 2021, 7:38am

hi, I met the same problem. Could you please paste the updated code here, thanks.

addisonklinke · June 9, 2021, 8:57pm

@loveltyoic There’s more discussion on the related Github issue here