Can't load model after dynamic quantization

Hello! I’m trying to do dynamic quantization as described here.

To quantize my own fine-tuned Bert model, I do this:

model = BertForSequenceClassification.from_pretrained('model_dir')
model.to('cpu')
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8)
quantized_model.save_pretrained('model_dir_q')

But when I later load the model, I get this error:

  File "/.../bert.py", line 329, in main
    model = BertForSequenceClassification.from_pretrained(args.output_dir)
  File "/.../transformers/modeling_utils.py", line 486, in from_pretrained
    model.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BertForSequenceClassification:
	While copying the parameter named "bert.encoder.layer.0.attention.self.query.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]).
	While copying the parameter named "bert.encoder.layer.0.attention.self.key.weight", whose dimensions in the model are torch.Size([768, 768]) and whose dimensions in the checkpoint are torch.Size([768, 768]).
...

Would appreciate any guidance for how to fix this.

The error message seems to be at least misleading.
Maybe @jerryzh168 knows, what’s going on.

I should add that I’m using Torch version 1.3.1 in case it matters.

@JeffO Is this model available publicly so that we can try the steps locally?

Hi @dskhudia, my model isn’t public, but you can recreate the error with just Bert base like this:

import torch
from transformers import BertConfig, BertForSequenceClassification

config = BertConfig.from_pretrained('bert-base-uncased')

model = BertForSequenceClassification.from_pretrained(
    'bert-base-uncased', config=config)

model.to('cpu')

quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8)

quantized_model.save_pretrained('model_dir_q')

model = BertForSequenceClassification.from_pretrained('model_dir_q')

I also ran into the same error.

I tried again with Torch 1.4.0. The previous error is no longer present, but quantization breaks the model. Performance goes from 68.6% down to 3.8% on my task.

I’ll dig deeper to see if I can figure out why, but any suggestions would be greatly appreciated.