This paper explains a new activation function that has a trainable parameter and is used for quantization of activations (Eq. 1, 2, and 3).

The quantization is as follows:

`y = 0.5(|x| - |x - alpha| + alpha)`

`y_q = round(y * (2 ^ k - 1) / alpha) * alpha / (2 ^ k - 1)`

,

where `alpha`

is the trainable parameter and `k`

is the number of bits.

The partial derivative of `y_q`

with respect to `alpha`

is mentioned in Eq. 3 of the paper.

What is the easiest method of integrating this activation function in PyTorch?

I was thinking of defining an `nn.Module`

that includes `alpha`

as a `Parameter`

. The problem is that there are two sets of gradients here: one for updating `alpha`

and another one for defining gradients with respect to inputs. I assume the latter should be handled in `backward()`

function, but I’m not sure how to update `alpha`

.