Quantization and nn.Parameters

yannbane · September 22, 2022, 5:44pm

My model defines the following field:

self.relative_position_bias_table = nn.Parameter(
    torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1),
                num_heads))  # 2*Wh-1 * 2*Ww-1, nH

And the following method:

def init_weights(self):
    super(WindowMSA, self).init_weights()

    trunc_normal_(self.relative_position_bias_table, std=0.02)

I believe this method is only called at the start of training. I’m working with a pre-trained model, and yet self.relative_position_bias_table is not all zeros. I concluded that nn.Parameter objects are also saved in the model file, and loaded similar to weights.

For some reason, however, quantizing the model using torch.quantization.convert skips this field. It’s left as a normal tensor, instead of being converted to int8 like other parameters of the model. I cannot combine it in operations with other quantized tensors.

What is the standard way of telling PyTorch to treat nn.Parameter fields as normal parameters and quantize them to int8?

One option would be to explicitly quantize this tensor every time. I find that wasteful. Caching the quantized version on the other hand sounds tacky and seems like something that PyTorch should be doing for me anyway.

Vasiliy_Kuznetsov · September 23, 2022, 4:08pm

Hi @yannbane , PyTorch quantization works on operations (conv, linear, etc) and weights used by those operations, it does not know how to quantize arbitrary parameters.

For the case you describe, the right way to quantize self.relative_position_bias_table would depend on how this parameter is used. If you share which operations use this parameter, happy to try to help more based on that context.

yannbane · September 23, 2022, 4:24pm

Thanks for the response. It’s just a tensor that gets summed with another tensor (attn). Here’s what I’m currently doing:

        if self.use_qtable:
            relative_position_bias_table = self.qrelative_position_bias_table 
        else:
            relative_position_bias_table = self.relative_position_bias_table

        relative_position_bias = relative_position_bias_table[
            self.relative_position_index.view(-1)].view(
                self.window_size[0] * self.window_size[1],
                self.window_size[0] * self.window_size[1],
                -1)  # Wh*Ww,Wh*Ww,nH
        relative_position_bias = relative_position_bias.permute(
            2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww
        
        attn = self.f_add.add(attn, relative_position_bias.unsqueeze(0))

The model has a field called use_qtable, which starts in False. After quantization statistics are collected, I call a method once to quantize self.relative_position_bias_table and store it in self.qrelative_position_bias_table. This method also sets use_qtable to True, and thereon the model uses the quantized version.

This is a hacky solution. self.relative_position_bias_table is stored in the model file as an nn.Parameter and conceptually it’s almost the same as a model weight tensor-- except that it was auto-generated.