Hi, I’m dealing with a weird problem which i cannot understand: to be short, i don’t understand why some lines of the minimal example below throws an IndexError: too many indices for tensor of dimension 2:
N = 1000
D = 42
t = torch.randn(N, D) # the dataset (time series data)
W = 10
# these indices create windows to feed to e.g. an LSTM
indices = [ range(i, i + W) for i in range(0, N - W) ]
windowed_data = t[indices] # this works
windowed_data = t[indices[:100] ] # this works too
windowed_data = t[indices[:10] ] # IndexError, wtf?
windowed_data = t[indices[:32] ] # this works
windowed_data = t[indices[:31] ] # IndexError
The above code shows that this list indexing i’m doing works well if the indices list is “long enough”, which seems to be above 32 of length, and throws an IndexError below 31.
Context: I am storing a time series dataset composed of N instances of dimensionality D, as a tensor of shape (N,D), and i wrote code to “windowize” it into overlapping windows of W instances, to get it to a shape of (N,W,D).
I was doing some debugging in other parts of the code and stumbled upon this problem, which made me think that maybe i don’t understand tensor list indexing too wee.
Can someone with more experience explain why this behavior appears?
Definitely the answer has its roots in the C memory buffer where the tensor is stored. The translation of the python slicing indices to C indices can also be an issue
Ideally it should keep giving error beyond 2 as the base tensor does not have any axes for memory referencing
Thanks @InnovArul and @my3bikaht ! Every test works perfectly if i use torch.tensor(indices) or use the more precise notation of t[indices[:10], :].
I’ll mentally archive this in my “weird things” section, i thought that there was going on something specific that i didn’t know of, thanks everyone for your help and participation!