First of all, I would like to thank you for the awesome torch.quantization . But at the moment, the quantization of embeddings is not supported, although ususally it’s one of the biggest (in terms of size) parts of the model (in NLP).
I tried to use nn.Embeddings as nn.Linear because they have a very similar nature, but get the following error:
RuntimeError: Could not run 'aten::index_select' with arguments from the 'QuantizedCPUTensorId' backend. 'aten::index_select' is only available for these backends: [CPUTensorId, CUDATensorId, SparseCPUTensorId, SparseCUDATensorId, VariableTensorId].
So I’m interested whether it’s planned to support nn.Embeddings quantization?
@jerryzh168
Hey, I saw that Embedding quantization was added in 1.7.0, but I can’t reproduce it with the latest version. I tried both static and dynamic quantization. Can you please share a code snippet that converts Embeddings to int8?
A small repro would be great. Currently can you try setting the qconfig for the embedding module to float_qparams_weight_only_qconfig? We only support float qparams quantization for the Embedding layers. If you use the default qconfig for Embedding layers, you may run into this error.
However according to this filefloat_qparams_weight_only_qconfig is part of torch.quantization. With pytorch 1.7.1 CPU version torch.quantization.float_qparams_weight_only_qconfig cannot be imported!
The error is because you are trying to pass in a FP32 input tensor to a quantized operator. If you change the model to include quant/dequant stubs it should work as expected
Seems to work now, but I am getting an error while slicing my indices tensor after the model was calibrated and quantized. The slicing worked every time before the quantization itself.
emb(Xi[:, i - self.num, :])
Error:
RuntimeError: Expect weight, indices, and offsets to be contiguous.
Using torch.LongTensor(128, 1).random_(0, 10) which leads to the same tensor shape as input works, but the tensor slicing seems to make problems.
It seems like this was recently modified in https://github.com/pytorch/pytorch/pull/48993. The operator expects the values passed in to the embedding operator to be contiguous.
You could check the inputs by doing x.is_contiguous() and call x.contiguous() if they are not.
I’ll file an issue to support this in the operator itself.
Is it somehow possible to skip embedding layers to be quantized in post-static quantization? So that only linear layers for instance are getting quantized, as it was with earlier versions?
Hi
I have the same error. I try to quantize the DETR.
Error:
“AssertionError: The only supported dtype for nnq.Embedding is torch.quint8”.
class DETR(nn.Module):
“”" This is the DETR module that performs object detection “”"
def init(self, backbone, transformer, num_classes, num_queries, aux_loss=False):
“”" Initializes the model.
Parameters:
backbone: torch module of the backbone to be used. See backbone.py
transformer: torch module of the transformer architecture. See transformer.py
num_classes: number of object classes
num_queries: number of object queries, ie detection slot. This is the maximal number of objects
DETR can detect in a single image. For COCO, we recommend 100 queries.
aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
“”"
super().init()
self.num_queries = num_queries
self.transformer = transformer
hidden_dim = transformer.d_model
self.class_embed = nn.Linear(hidden_dim, num_classes + 1)
self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)
self.query_embed = nn.Embedding(num_queries, hidden_dim)
self.query_embed.qconfig = None # --------------------------------------------------
self.qconfig = default_qconfig
self.input_proj = nn.Conv2d(backbone.num_channels, hidden_dim, kernel_size=1)
self.backbone = backbone
self.aux_loss = aux_loss
def forward(self, samples: NestedTensor):
"""Â The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels
It returns a dict with the following elements:
- "pred_logits": the classification logits (including no-object) for all queries.
Shape= [batch_size x num_queries x (num_classes + 1)]
- "pred_boxes": The normalized boxes coordinates for all queries, represented as
(center_x, center_y, height, width). These values are normalized in [0, 1],
relative to the size of each individual image (disregarding possible padding).
See PostProcess for information on how to retrieve the unnormalized bounding box.
- "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
dictionnaries containing the two above keys for each decoder layer.
"""
samples = self.quant(samples) #------------------------------------------------
if isinstance(samples, (list, torch.Tensor)):
samples = nested_tensor_from_tensor_list(samples)
features, pos = self.backbone(samples)
src, mask = features[-1].decompose()
assert mask is not None
hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]
outputs_class = self.class_embed(hs)
outputs_coord = self.bbox_embed(hs).sigmoid()
out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}
if self.aux_loss:
out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord)
out = self.dequant(out) #-------------------------------------------------------------
return out
hi @supriyar
I tried this way and got the same error as before.
I’m trying to quantize DETR model but I faced with this error. Do you think I should try something?