You are right, most function are still old-style which don’t support grad of grad.
There is a temporary fix: use difference rather than differential
x_1,x_2 are sampled from x_hat
idea from 郑华滨
You are right, most function are still old-style which don’t support grad of grad.
There is a temporary fix: use difference rather than differential
x_1,x_2 are sampled from x_hat
idea from 郑华滨