also related to custom autograd.Function and backward pass. Inspired by your code,
I modified my code and make it works now in the newer version of pytorch. However, I do not
clone the input during the forward pass, and it still works. Interesting, but also confusing me!