# Quantizing layers output from outside the object nn.Module

Hi ! I’m working on quantizing models and I came across a technical issue. I would like for any given nn.Module to quantize/modify the tensor at each step of the forward ops.

Is there a way to do so from outside the model, i.e. not modifying the forward method of the model ? If I understood well, I can modify the forward like that (quantization would be binarization with the sign function):

``````def forward(self, x)
x = self.fc(x)
# here I could quantize before the non linearity if needed
x = self.relu(x)
x = sign(x) # quantize after non linearity
return x
``````

And as we can see, it can become tedious if the model is complex, whereas the goal would be to not intervene on the definition of the network.
Use case would be to apply seamlessly to any given model the quantization procedure so as to quickly test the results across a wide range of diverse models.

To be more precise, let’s suppose I want to apply a sign function (binarization). I can play around with the model.modules().weight to binarize the weights. How to do the same for the resulting tensors at each op ? If doable, can we differentiate between the layer types (linear, conv, relu, etc) ?

Let me know if I can input any necessary information !
Thanks ! If I understand correctly, you can just use the outputs from the forward pass without any modification, and then apply the sign funciton on the outptus:

``````output = model(x)
q = sign(output)
``````

Also, another thing is that you don’t have to apply `relu()` and then `sign()`. `relu()` will convert the negative values to zero, and leave the positive values untouched. As a result, if you apply `relu` and `sign` together, your final outputs will only be `0` and `1` so you will miss the `-1` in the output.

Hi thanks for your reply ! I admit using `relu` as a non linearity (or the quantization at that step) may not have been the best example thanks for pointing it out To be more precise on the problem at hand, I need to diagnose if the model will have a correct performance by only using few bits at each layer (ideally binarized). Thus only applying at the output will still allow the model to use 32/16FP at each hidden layer.
The goal would be to artificially constraint a layer’s output bitwidth, as if the model was operating in a binary setting.

See i.e. https://github.com/itayhubara/BinaryNet.pytorch where they supercharge `nn.Conv2d` or `nn.Linear`. This is exactly the same use case but I’d like to avoid meddling with each implementation to replace `nn.Conv2d` by its binarized alter ego : making use of an external class to manage quantization seems much more “clean” and would allow to test rapidly a range of different quantization solutions.

I see your point! So, I guess the easiest way is to write custom layers and define the `forward()` functions to include the `sign()` before returning the output. So, for example, instead of using `nn.Conv2d`, you can define your own `MyConv2d` so that it applies the `sign()` function in its forward. Then, change all the layesr built using `nn.Conv2d` to `MyConv2d`. Similarly, you can define a `MyLinear` class that does the FC layer but in addition it applies the `sign()` function.

1 Like