Modifying weight update gradient in my training loop

Mathieu_Chene · March 13, 2024, 10:08am

Hello,

I am working on a project in which I have to study the impact of low precision quantization (int8) on accuracy.
I found this work that has been presented in NEURIPS 2018 that quantize the gradient used for “backprop” and keep the gradient for weight update in full resolution (FP32). [GitHub - eladhoffer/quantized.pytorch](https://Quantized Convolutional networks using PyTorch)

I want also to quantize the weight update gradient before the update operation and I was wondering if it was better to quantize out1 with autograd as done in the previous workfor “backprop” gradient (quantize.py file) or to get and update the parameters gradients with .grad.

Modifying github code:

def conv2d_biprec(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, num_bits_grad=None):
    out1 = F.conv2d(input.detach(), weight, bias,
                    stride, padding, dilation, groups)
    out1 = quantize_grad(out1, num_bits=num_bits_grad)
    out2 = F.conv2d(input, weight.detach(), bias.detach() if bias is not None else None,
                    stride, padding, dilation, groups)
    out2 = quantize_grad(out2, num_bits=num_bits_grad)
    return out1 + out2 - out1.detach()


def linear_biprec(input, weight, bias=None, num_bits_grad=None):
    out1 = F.linear(input.detach(), weight, bias)
    out1 = quantize_grad(out1, num_bits=num_bits)
    out2 = F.linear(input, weight.detach(), bias.detach()
                    if bias is not None else None)
    out2 = quantize_grad(out2, num_bits=num_bits_grad)
    return out1 + out2 - out1.detach()

Modifying gradient using .grad:

if training:
           optimizer.zero_grad()
           loss.backward()
           for p in model.parameters():
            p.grad=gemmlowpquantization(p.grad,8)
           optimizer.step()

I don’t have the same impact on training (speed, final precision) when I try both, so I want to know which is more rigorous.

Thanks in advance for your help.

Mathieu

Ushavilash · March 13, 2024, 11:21am

It’s recommended to quantize the gradients directly using .grad for each parameter tensor.

Mathieu_Chene · March 13, 2024, 1:00pm

Thank you for your answer. I’ll use the second method.

Ushavilash · March 14, 2024, 7:58am

You’re welcome and it sounds good.