I’m working on quantizing models and I came across a technical issue. I would like for any given nn.Module to quantize/modify the tensor at each step of the forward ops.

Is there a way to do so from outside the model, i.e. not modifying the forward method of the model ? If I understood well, I can modify the forward like that (quantization would be binarization with the sign function):

def forward(self, x)
x = self.fc(x)
# here I could quantize before the non linearity if needed
x = self.relu(x)
x = sign(x) # quantize after non linearity
return x

And as we can see, it can become tedious if the model is complex, whereas the goal would be to not intervene on the definition of the network.
Use case would be to apply seamlessly to any given model the quantization procedure so as to quickly test the results across a wide range of diverse models.

To be more precise, let’s suppose I want to apply a sign function (binarization). I can play around with the model.modules().weight to binarize the weights. How to do the same for the resulting tensors at each op ? If doable, can we differentiate between the layer types (linear, conv, relu, etc) ?

Let me know if I can input any necessary information !
Thanks !

If I understand correctly, you can just use the outputs from the forward pass without any modification, and then apply the sign funciton on the outptus:

output = model(x)
q = sign(output)

Also, another thing is that you don’t have to apply relu() and then sign(). relu() will convert the negative values to zero, and leave the positive values untouched. As a result, if you apply relu and sign together, your final outputs will only be 0 and 1 so you will miss the -1 in the output.

Hi thanks for your reply !
I admit using relu as a non linearity (or the quantization at that step) may not have been the best example thanks for pointing it out

To be more precise on the problem at hand, I need to diagnose if the model will have a correct performance by only using few bits at each layer (ideally binarized). Thus only applying at the output will still allow the model to use 32/16FP at each hidden layer.
The goal would be to artificially constraint a layer’s output bitwidth, as if the model was operating in a binary setting.

See i.e. https://github.com/itayhubara/BinaryNet.pytorch where they supercharge nn.Conv2d or nn.Linear. This is exactly the same use case but I’d like to avoid meddling with each implementation to replace nn.Conv2d by its binarized alter ego : making use of an external class to manage quantization seems much more “clean” and would allow to test rapidly a range of different quantization solutions.

I see your point! So, I guess the easiest way is to write custom layers and define the forward() functions to include the sign() before returning the output. So, for example, instead of using nn.Conv2d, you can define your own MyConv2d so that it applies the sign() function in its forward. Then, change all the layesr built using nn.Conv2d to MyConv2d. Similarly, you can define a MyLinear class that does the FC layer but in addition it applies the sign() function.