Tensor View Operations on Non-contiguous tensors

DerekGloudemans · November 8, 2021, 3:57pm

My understanding of Tensor View operations is that they require a contiguous tensor input to work (which intuitively makes sense). To test this behavior I performed the following test, the results of which I find confusing:

import torch

test = torch.rand([10,10])
test.is_contiguous() # True

test = test[:8,1:9] # now of size [8,8]
test.is_contiguous() # False, as expected

test.view(-1) # returns error, as expected
# RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

# confusingly, no errors from any of these operations
test.diagonal() 
test.permute(1,0)
test.unbind(1)
...

My question is, why do operations such as diagonal and permute not cause contiguous errors, despite the fact that they are Tensor View operations as listed here ?

My hypothesis is that these functions contain calls to torch.contiguous() on the inputs before performing the actual desired operation. This wouldn’t impact the performance of these calls much. I’m having trouble verifying this though because it is extremely difficult to find the source code for these operations.

googlebot · November 8, 2021, 5:41pm

your tensor is an easy case, as stride() is monotonically decreasing, and it is non-contiguous just because “logical” size in last dimension is reduced, i.e. some memory areas are skipped.

so, a diagonal view is possible (if you check diagonal().stride(), it is 11, that’s because it is iterating a 10x10 grid, but takes only 8 elements)

and permute() just permutes metadata (sizes & strides), so it never fails by itself

DerekGloudemans · November 8, 2021, 5:52pm

Could you provide an example of diagonal that is not an easy case and fails when run on a non-contiguous tensor?

How would you characterize generally the operations on Tensor Views that cause errors when non-contiguous tensors are input?

Concretely, I cannot find any function calls that result in errors with non-contiguous inputs besides view, despite the fact that many other operations rely on tensor views. Does this indicate that I am simply not inputting complex enough inputs that the non-contiguity results in issues, or does it indicate that these functions lazily handle non-contiguous inputs by wrapping the operation in a call to contiguous()?

googlebot · November 8, 2021, 6:56pm

I don’t know what are fail conditions of diagonal(), maybe there are none.

To your second question, it is operation dependent, but generally - either contiguous() is done under the hood, or tensor iterators handle read pointers, or cuda kernels do the address arithmetic (and perform worse if different areas must be read). IIRC, torch’s operations very rarely refuse to work because of non-contiguity.

In most cases, I wouldn’t bother with explicit conitguous copies, unless a non-contiguous tensor is used multiple times, it is used in a heavy operation like matmul, or if stride() is ugly and the next operation takes a lot of extra time (as per profiler) because of it.

Siddhanth_Ramani · July 5, 2024, 10:32pm

How does it intuitively make sense?

DerekGloudemans · July 6, 2024, 2:49am

I did some more extensive research on tensor views and wrote it up in a Stack Exchange answer here, which addresses the original question. python - What functions or modules require contiguous input? - Stack Overflow

DerekGloudemans · July 6, 2024, 2:56am

Sorry, imprecise description. Intuitive to me anyway, because a tensor view is essentially a pointer to a memory location rather than independently allocated memory. If data is not stored in a single place, then an arbitrary number of memory pointers are required to express the tensor view; this quickly grows in scale to be on the same order of magnitude as the original data and eliminates any advantage of using a tensor view; thus, the developers precluded this possibility.

Now it is relatively simple and within the scope of tensor views to keep track of simple indexing operations, e.g. (“an [x,y] tensor is at memory location Z, and should be viewed with a stride of 2” is a precise and compact description, whereas "element 0 is stored at memory location Z, element 1 is stored at memory location A, …) For this reason, tensor views are restricted to contiguous tensors and operations that can be simply expressed and therefore stored on those contiguous tensors.