Bias quantized with the same scale as the Convolution result

iviarcio · October 7, 2021, 7:06pm

Hey everyone. I’m building a new backend for Glow called NMP (from NeuroMorphic Processor). It is an embedded device to make inferences from convolutional neural networks. This device only works with quantized models (Int8 and Int16). It turns out that the external library that performs the convolution operations expects the Bias to be quantized on the same scale as the Convolution result (the same applies to FullyConnected operations). Any ideas on how can I specialize this type of quantization?

iviarcio · October 12, 2021, 5:32pm

I inserted the following code in the NMPBackend module:

template <typename T> static void RescaleBias(Node &node, Function *F) {
  auto res = dyn_cast<T>(&node)->getResult();
  const TypeRef resTy = res.getType();
  auto bias = dyn_cast<T>(&node)->getBias();
  const TypeRef biasTy = bias.getType();
  auto ElemTy = biasTy->getElementType();
  const TypeRef newTy = F->getParent()->uniqueType(
      ElemTy, biasTy->dims(), resTy->getScale(), resTy->getOffset());
  bias.setType(newTy);
}

Expected<bool>
NMPBackend::transformPostOptPipeline(Function *F,
                                     CompilationContext &cctx) const {
  for (auto &node : F->getNodes()) {
    switch (node.getKind()) {
    case Kinded::Kind::ConvolutionNodeKind:
      RescaleBias<ConvolutionNode>(node, F);
      break;
    case Kinded::Kind::FullyConnectedNodeKind:
      RescaleBias<FullyConnectedNode>(node, F);
      break;
    default:;
    }
    continue;
  }
  convertQuantizedConstants(F, cctx);

Dumping the graph and LIR, the type and quantization information is correct, but the data is not quantized correctly. Apparently, the convertQuantizedConstants(F, cctx) call has no effect. Any tips?

jfix · October 19, 2021, 5:01pm

Hi @iviarcio, the NodeValue::setType() call does not change the actual payload of a Constant node, assuming this is what you are expecting to happen.

What you need to do is something like

template <typename T> static void rescaleBias(T *node, Function *F) {
  auto *bias = dyn_cast<Constant>(node->getBias());
  CHECK(bias) << "Expected bias would be Constant";
  Tensor floatBias = quantization::dequantizeTensor(bias->getPayload(), 
                                                    ElemKind::FloatTy);
  auto outTy = node->getResult().getType();
  Tensor newBias = quantization::quantizeTensor(floatBias, 
                                                {outTy->getScale(), outTy->getOffset()},
                                                outTy->getElementType());
  Constant *newConst = F->getParent()->createConstant(bias->getName().str() + "_converted",
                                                      std::move(newBias));
  bias->getOutput().replaceAllUsesOfWith(newConst->getOutput());
}

iviarcio · October 20, 2021, 11:54am

Thank you @jfix. I had already detected this issue and had adjusted the payload by a factor which is the ratio between the before and after scale, but its solution is more suitable as it follows the glow patterns.