No particular reason – your T is a 3d tensor so the two versions are
equivalent. I suppose that using “negative” dimensions emphasizes
that we’re extracting the diagonals from the 2d matrices made up by
the last two dimensions of T (so that this version would generalize to a
hypothetical use case where T had multiple leading “batch” dimensions
such as T of shape [batch_size, channel_size, size_n, size_n]).
It’s really just stylistic – and not necessarily a better style.