Tensor View Operations on Non-contiguous tensors

My understanding of Tensor View operations is that they require a contiguous tensor input to work (which intuitively makes sense). To test this behavior I performed the following test, the results of which I find confusing:

import torch

test = torch.rand([10,10])
test.is_contiguous() # True

test = test[:8,1:9] # now of size [8,8]
test.is_contiguous() # False, as expected

test.view(-1) # returns error, as expected
# RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

# confusingly, no errors from any of these operations
test.diagonal() 
test.permute(1,0)
test.unbind(1)
...

My question is, why do operations such as diagonal and permute not cause contiguous errors, despite the fact that they are Tensor View operations as listed here ?

My hypothesis is that these functions contain calls to torch.contiguous() on the inputs before performing the actual desired operation. This wouldn’t impact the performance of these calls much. I’m having trouble verifying this though because it is extremely difficult to find the source code for these operations.

your tensor is an easy case, as stride() is monotonically decreasing, and it is non-contiguous just because “logical” size in last dimension is reduced, i.e. some memory areas are skipped.

so, a diagonal view is possible (if you check diagonal().stride(), it is 11, that’s because it is iterating a 10x10 grid, but takes only 8 elements)

and permute() just permutes metadata (sizes & strides), so it never fails by itself

Could you provide an example of diagonal that is not an easy case and fails when run on a non-contiguous tensor?

How would you characterize generally the operations on Tensor Views that cause errors when non-contiguous tensors are input?

Concretely, I cannot find any function calls that result in errors with non-contiguous inputs besides view, despite the fact that many other operations rely on tensor views. Does this indicate that I am simply not inputting complex enough inputs that the non-contiguity results in issues, or does it indicate that these functions lazily handle non-contiguous inputs by wrapping the operation in a call to contiguous()?

I don’t know what are fail conditions of diagonal(), maybe there are none.

To your second question, it is operation dependent, but generally - either contiguous() is done under the hood, or tensor iterators handle read pointers, or cuda kernels do the address arithmetic (and perform worse if different areas must be read). IIRC, torch’s operations very rarely refuse to work because of non-contiguity.

In most cases, I wouldn’t bother with explicit conitguous copies, unless a non-contiguous tensor is used multiple times, it is used in a heavy operation like matmul, or if stride() is ugly and the next operation takes a lot of extra time (as per profiler) because of it.