Circumventing an indifferentiablility problem

Sure, but in that case (ordinarily apply backpropagation once), loss won’t decrease since the derivative of torch.sign() always returns 0 as a gradient.
So I got this output below:

========train========
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0