# How to use an `optimizer` to update non-differentiable parameters?

Assume there is a non-differentiable `nn.Parameter` in an equation whose gradient needs to be estimated using a straight-through estimator (STE) before the parameter can be updated. For example,

`y = 0.5(|x| - |x - alpha| + alpha)`
`y_q = round(y * (2 ** k - 1) / alpha) * alpha / (2 ** k - 1)`.

In this equation, `alpha` is a trainable parameter and the derivative of `y_q` w.r.t `alpha` needs to be estimated using an STE.

How can I define the approximated gradient and use it in an `optimizer` to update the parameter along with the rest of parameters in the network?

1 Like

Note that it should likely be `**`, not `^`.
I like the trick of `y_q_diffable = y + (y_q - y).detach()`. (In fact, I once proposed to give a lightning talk just on this line of code and whereit is useful.)

I always like to credit @hughperkins for sharing the trick here on the forums when he seen it in a paper and he knows a ton references for applications, too.

Best regards

Thomas

2 Likes

Yes, youâ€™re right. It should be `**`.

Do you mind explaining this trick a little bit more in detail?

y_q_diffable is y_q ( = y + y_q - y) for the forward. But during backwards, the gradients propagate as if y_q_diffable were y.

Best regards

Thomas

1 Like

@tom I am curious, did you prepare that talk in the end? Would you mind sharing some references for applications?

No, I didnâ€™t, but my favourite application that I use in my autograd course is to emulate quantization aware training with it. The course are not freely available, but the particular example is also included in the ACDL â€śAdvanced introduction to PyTorchâ€ť talk of which I published the slides. I donâ€™t know of any video recording and there wasnâ€™t enough interest to re-record it back then.

1 Like