because you redefined x0. the second x0 is not really the x0 you defined the first time. it is not really reusing thje same variable, but redefining a symbol to be a different variable. your new variable here is an intermediate results, which do not retain gradients by default. you can call retain_grad() on it to get grad http://pytorch.org/docs/master/autograd.html#torch.autograd.Variable.retain_grad.
Thanks for your reply! Now I know how to obtain gradient for every step.
But I’m still wondering, after I changed x0 for every step, when I do backward and apply gradient to variables by step(), still the original x0 (first defined x0) will be updated but just I can’t print out its gradient, right?