My understanding of Tensor View operations is that they require a contiguous tensor input to work (which intuitively makes sense). To test this behavior I performed the following test, the results of which I find confusing:
test = torch.rand([10,10])
test.is_contiguous() # True
test = test[:8,1:9] # now of size [8,8]
test.is_contiguous() # False, as expected
test.view(-1) # returns error, as expected
# RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
# confusingly, no errors from any of these operations
My question is, why do operations such as
permute not cause contiguous errors, despite the fact that they are Tensor View operations as listed here ?
My hypothesis is that these functions contain calls to torch.contiguous() on the inputs before performing the actual desired operation. This wouldn’t impact the performance of these calls much. I’m having trouble verifying this though because it is extremely difficult to find the source code for these operations.
your tensor is an easy case, as stride() is monotonically decreasing, and it is non-contiguous just because “logical” size in last dimension is reduced, i.e. some memory areas are skipped.
so, a diagonal view is possible (if you check diagonal().stride(), it is 11, that’s because it is iterating a 10x10 grid, but takes only 8 elements)
and permute() just permutes metadata (sizes & strides), so it never fails by itself
Could you provide an example of
diagonal that is not an easy case and fails when run on a non-contiguous tensor?
How would you characterize generally the operations on Tensor Views that cause errors when non-contiguous tensors are input?
Concretely, I cannot find any function calls that result in errors with non-contiguous inputs besides
view, despite the fact that many other operations rely on tensor views. Does this indicate that I am simply not inputting complex enough inputs that the non-contiguity results in issues, or does it indicate that these functions lazily handle non-contiguous inputs by wrapping the operation in a call to
I don’t know what are fail conditions of diagonal(), maybe there are none.
To your second question, it is operation dependent, but generally - either contiguous() is done under the hood, or tensor iterators handle read pointers, or cuda kernels do the address arithmetic (and perform worse if different areas must be read). IIRC, torch’s operations very rarely refuse to work because of non-contiguity.
In most cases, I wouldn’t bother with explicit conitguous copies, unless a non-contiguous tensor is used multiple times, it is used in a heavy operation like matmul, or if stride() is ugly and the next operation takes a lot of extra time (as per profiler) because of it.