How PyTorch differentiates on non-scalar variable?

They are plenty of sources on the forum, you should do some search before asking. For example, see discussion here