However, I got an error One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

Could you tell me how to checkout which one tensor is not in the graph?

This will happen if the loss is linear in a parameter and thus the first gradient does not depend on it.
The quickest way to find which one is to use allow_unused=True in the grad call and then check the gradients it returns.

Thanks for your timely response.
What is the meaning of " the loss is linear in a parameter and thus the first gradient does not depend on it"?
Do you mean that, assuming y = 2x, d(dy/dx)/dx = 0? But why does PyTorch just set the gradient to 0?