I tried to use torch.autograd.grad() to calculate gradients for a quantized model, just as what we usually do on full precision models:
for idx, (inputs, targets) in enumerate(data_loader):
with torch.enable_grad():
inputs.requires_grad = True
outputs = quantized_model(inputs)
loss = criterion(outputs, targets)
grads = torch.autograd.grad(loss, inputs)[0]
But I got a RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Does models quantized with PyTorch Quantization currently do not support backpropagation? Is there some methods I can calculate the gradients for PyTorch quantized models?
quantized models currently run only during inference so you can only call forward on them. If you are trying out quantization aware training Quantization Recipe — PyTorch Tutorials 1.9.1+cu102 documentation, we do support back-propagation in that case during training.
Thank you for the reply. I know that quantization aware training use fake quantization during training, which simulates quantization with fp32. I want to know that what is the difference between fake quantization and real quantization, especially when we do back-propagation on them?
fake quantization simulates quantization but uses high precision data types
so for example imagine if you were trying to quantize to integers.
mathematically a quantized linear op would be:
X = round(X).to(int)
weight = round(weight).to(int)
out = X*weight
whereas a fake_quantized linear would be
X = round(X).to(fp32)
weight = round(weight).to(fp32)
out = X*weight
In practice quantized weights are stored as quantized tensors which are difficult to interact with in order to make them able to perform quantized operations quickly.
fake_quantized weights are stored as floats so you can interact with them easily in order to do gradient updates.