Assume there is a non-differentiable nn.Parameter in an equation whose gradient needs to be estimated using a straight-through estimator (STE) before the parameter can be updated. For example, y = 0.5(|x| - |x - alpha| + alpha) y_q = round(y * (2 ** k - 1) / alpha) * alpha / (2 ** k - 1). In this…

y_q_diffable is y_q ( = y + y_q - y) for the forward. But during backwards, the gradients propagate as if y_q_diffable were y. [image] Stop gradients (for ST gumbel softmax) From https://gist.github.com/ericjang/1001afd374c2c3b7752545ce6d9ed349#file-gumbel-softmax-py-L27 …

How to use an `optimizer` to update non-differentiable parameters?

tom (Thomas V) July 5, 2018, 8:42pm 4

y_q_diffable is y_q ( = y + y_q - y) for the forward. But during backwards, the gradients propagate as if y_q_diffable were y.

Best regards

Thomas

2 Likes