Unpacking Quantized model in C++ with torchscript format

k.osama · July 22, 2020, 7:59pm

Hi I am working on a quantized model in C++. I have trained and quantized the model in Python and loaded to C++ (post training quantization). I wonder if I can parse the jitted model parameters (torchscript format) in C++ ? I could not find any layer-unpacking modules in torch::jit::script::Module .
After loading the model, I can dump the scriptModule modules and parameters using (torch::jit::script::Module) m->dump():

--------------------------------------------------------------------------------------------------------------------
dumping module module __torch__.Net {
	  parameters {
	  }
	  attributes {
		training = False
	(Here->)  fc1 = <__torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU object at 0x5555566b75f0>
		Relu1 = <__torch__.torch.nn.modules.linear.Identity object at 0x5555566b3d40>
		fc2 = <__torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU object at 0x5555566a7030>
		Relu2 = <__torch__.torch.nn.modules.linear.Identity object at 0x5555567b1440>
		droput2 = <__torch__.torch.nn.modules.dropout.Dropout object at 0x5555567b1640>
		fc3 = <__torch__.torch.nn.quantized.modules.linear.Linear object at 0x5555567b22c0>
		quant = <__torch__.torch.nn.quantized.modules.Quantize object at 0x5555567b2ad0>
		dequant = <__torch__.torch.nn.quantized.modules.DeQuantize object at 0x5555567b8350>
		logMax = <__torch__.torch.nn.modules.activation.LogSoftmax object at 0x5555567b87c0>
	  }
	  methods {
		method forward {
		  graph(%self.1 : __torch__.Net,
				%x.1 : Tensor):
			%7 : int = prim::Constant[value=-1]() # ~//MNIST_PyTorch_Quantize.py:50:19
			%8 : int = prim::Constant[value=784]() # ~//MNIST_PyTorch_Quantize.py:50:23
			%3 : __torch__.torch.nn.quantized.modules.Quantize = prim::GetAttr[name="quant"](%self.1)
			%x0.1 : Tensor = prim::CallMethod[name="forward"](%3, %x.1) # :0:0
			%9 : int[] = prim::ListConstruct(%7, %8)
			%x1.1 : Tensor = aten::view(%x0.1, %9) # ~//MNIST_PyTorch_Quantize.py:50:12
			%12 : __torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU = prim::GetAttr[name="fc1"](%self.1)
			%x2.1 : Tensor = prim::CallMethod[name="forward"](%12, %x1.1) # :0:0
			%16 : __torch__.torch.nn.modules.linear.Identity = prim::GetAttr[name="Relu1"](%self.1)
			%x3.1 : Tensor = prim::CallMethod[name="forward"](%16, %x2.1) # :0:0
			%20 : __torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU = prim::GetAttr[name="fc2"](%self.1)
			%x4.1 : Tensor = prim::CallMethod[name="forward"](%20, %x3.1) # :0:0
			%24 : __torch__.torch.nn.modules.linear.Identity = prim::GetAttr[name="Relu2"](%self.1)
			%x5.1 : Tensor = prim::CallMethod[name="forward"](%24, %x4.1) # :0:0
			%28 : __torch__.torch.nn.modules.dropout.Dropout = prim::GetAttr[name="droput2"](%self.1)
			%x6.1 : Tensor = prim::CallMethod[name="forward"](%28, %x5.1) # :0:0
			%32 : __torch__.torch.nn.quantized.modules.linear.Linear = prim::GetAttr[name="fc3"](%self.1)
			%x7.1 : Tensor = prim::CallMethod[name="forward"](%32, %x6.1) # :0:0
			%36 : __torch__.torch.nn.quantized.modules.DeQuantize = prim::GetAttr[name="dequant"](%self.1)
			%x8.1 : Tensor = prim::CallMethod[name="forward"](%36, %x7.1) # :0:0
			%40 : __torch__.torch.nn.modules.activation.LogSoftmax = prim::GetAttr[name="logMax"](%self.1)
			%42 : Tensor = prim::CallMethod[name="forward"](%40, %x8.1) # :0:0
			return (%42)
		}
	  }
	  submodules {
		module __torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU {
		  parameters {
		  }
		  attributes {
			training = False
			in_features = 784
			out_features = 512
			scale = 0.048926487565040588
			zero_point = 0
			_packed_params = <__torch__.torch.nn.quantized.modules.linear.LinearPackedParams object at 0x5555566bcc40>
		  }
		  methods {
			method forward {
			  graph(%self.1 : __torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU,
					%input.1 : Tensor):
				%4 : __torch__.torch.nn.quantized.modules.linear.LinearPackedParams = prim::GetAttr[name="_packed_params"](%self.1)
				%5 : Tensor = prim::GetAttr[name="_packed_params"](%4)
				%7 : float = prim::GetAttr[name="scale"](%self.1)
				%9 : int = prim::GetAttr[name="zero_point"](%self.1)
				%Y_q.1 : Tensor = quantized::linear_relu(%input.1, %5, %7, %9) # ~/python3.7/site-packages/torch/nn/intrinsic/quantized/modules/linear_relu.py:29:14
					return (%Y_q.1)
			}
		  }
		  submodules {
		   (+)        module __torch__.torch.nn.quantized.modules.linear.LinearPackedParams {
		  }
		}
		 module __torch__.torch.nn.modules.linear.Identity {} (+)
		 module __torch__.torch.nn.intrinsic.quantized.modules.linear_relu.LinearReLU {} (+)
		 module __torch__.torch.nn.modules.linear.Identity {} (+)
		 module __torch__.torch.nn.modules.dropout.Dropout {} (+)
		 module __torch__.torch.nn.quantized.modules.linear.Linear {} (+) 
		 module __torch__.torch.nn.quantized.modules.Quantize {} (+) 
		 module __torch__.torch.nn.quantized.modules.DeQuantize {}  (+)
		 module __torch__.torch.nn.modules.activation.LogSoftmax {} (+)
		} //   end of submodules 
		}  //   end of  dumping module module __torch__.Net
	--------------------------------------------------------------------------------------------------------

Notes: (+) means there are collapsed lines omitted to save space.
Torch version 1.6.0+.
My Questions:
1- During model load, are the module layers packed again to a format like in torch::nn::Linear and torch::nn::Conv1d … etc? how can I access them?
2- Are the pointers printed in the dump (like the line marked with “(Here->)” ) for Python objects and methods or they are C++ objects and methods? are they cast-able to the formats in torch::nn:* ?
3- What is the recommended procedure to structure the model again in terms of the number of layers , the attributes\configuration of each layer and the corresponding trained weights from the jitted\torchscript format ?

jerryzh168 · July 22, 2020, 9:26pm

1- During model load, are the module layers packed again to a format like in torch::nn::Linear and torch::nn::Conv1d … etc? how can I access them?

what is torch::nn::Linear? are you talking about the pytorch c++ API? the weights are packed in linear modules, you can use https://github.com/pytorch/pytorch/blob/master/torch/nn/quantized/modules/linear.py#L34 to get the unpacked weights.

Are the pointers printed in the dump (like the line marked with “(Here->)” ) for Python objects and methods or they are C++ objects and methods? are they cast-able to the formats in torch::nn:* ?

These are TorchScript objects I think. I’m not sure about the relationship between the pytorch c++ API and TorchScript, cc @Michael_Suo could you comment on this?

What is the recommended procedure to structure the model again in terms of the number of layers , the attributes\configuration of each layer and the corresponding trained weights from the jitted\torchscript format ?

Not sure I understand the question, could you be more concrete?

k.osama · July 23, 2020, 10:03pm

Yes, I mean C++ API. As mentioned in description, I have trained the model in Python and exported it as torchscript to C++. Is the function “_weight_bias” accessible in torchscript format () (I can see that it is has the decorator @torch.jit.export) ? If yes, can you show please how to use it with loaded torchscript models ?
Rephrase 3-: Given a model that is trained in python, saved as “torchscript” and loaded into C++ front end, I want to extract the number of layers, the type of each layer (Linear,conv1d…), the size of each layer and weights associated with each layer… How can I do that ?

jerryzh168 · July 24, 2020, 8:45pm

here is an example calling method in a TorchScript Moudule: insert_quant_dequant.cpp source code [pytorch/torch/csrc/jit/passes/quantization/insert_quant_dequant.cpp] - Woboq Code Browser

if by C++ API you mean TorchScript then this should work. There is another C++ API that is authoring models in C++(C++ — PyTorch 2.1 documentation) which I’m not familiar with.

for question 3: API of torch::jit::Module can be found in module.h source code [pytorch/torch/csrc/jit/api/module.h] - Woboq Code Browser