# Generating a 2D matrix with 1s at particular positions

I have a 1D tensor `s` with elements in ascending order. I want to generate a 2D tensor `a` of shape `(len(s)-1, s[-1])` such that its elements `[i, s[i]:s[i+1]]` are `1`, all other elements being `0`. How can I do this vectorially without using loops?

Example:

Input: `s = torch.tensor([0,2,5,7])`
Output:

``````a = torch.tensor([
[1,1,0,0,0,0,0],
[0,0,1,1,1,0,0],
[0,0,0,0,0,1,1]
])
``````

Loop implementation:

``````a = torch.zeros(len(s)-1,s[-1])
for i in range(len(a)):
a[i, s[i]:s[i+1]] = 1
``````

This is part of training a neural network, so I’m looking for a vector implementation.

In case it helps, this is actually part of a bigger problem where I am trying to sum unevenly-sized rows of a 2D tensor.

As an example:

``````inp = torch.tensor([
[0,1],
[2,3],
[4,5],
[6,7],
[8,9],
[10,11],
[12,13]
])
s = torch.tensor([0,2,5,7])
``````

I want my output to be of 3 rows, 0th row should sum rows `0:2` of `inp`, 1’th’ row should sum rows `2:5` of `inp`, and 2’th’ row should sum rows `5:7` of `inp`. So, the output should be:

``````out = torch.tensor([
[2,4],
[18,21],
[22,24]
])
``````

I realized I could do this if I can generate `a` as above, then `out = a@inp`.

I know I can solve the problem using `scatter_add`, but I don’t know how to construct the `index` parameter of `scatter_add` from `s` without using loops.

I think you will have to loop over the splits.
But here’s how it can be done:
Convert your s array to an array representing the length of each split. So `s = [2-0, 5-2, 7-5] = [2,3,2]`
Then use `split`, `sum`, `stack` as follows:
`torch.stack([i.sum(0) for i in torch.split(inp, [2,3,2])])`

Thanks, yeah that’s definitely a good way of doing it.

But I’m still wondering if there’s a non-loop way of doing it in time complexity O(1). Something like `torch.sum(inp, splits=s)` that would automatically sum over contiguous chunks of uneven sizes and collect them together.

Not the prettiest solution, but it does away with loops.

``````import torch

s=torch.tensor([0,2,5,7])
w=s.size()-1

sp=torch.stack([torch.zeros(w),s[:-1],s[1:], torch.full((w,), s.max().item())]).T
sp = sp[:,1:]-sp[:,:-1]
sp = sp.int()

z=torch.stack([torch.cat([torch.zeros(sp[i].item()), torch.ones(sp[i].item()), torch.zeros(sp[i].item()) ]) for i in range(sp.size())])
print(z)
``````

EDIT: That should have been zero on the `range(sp.size())`. Should work now. Please try again.

Here is a bigger test on the above method:

``````import torch

s=torch.arange(10000)*2
print(s)
``````

tensor([ 0, 2, 4, …, 19994, 19996, 19998])

So the final size should be 19998 on dim 1.

``````w=s.size()-1
sp=torch.stack([torch.zeros(w),s[:-1],s[1:], torch.full((w,), s.max().item())]).T
sp = sp[:,1:]-sp[:,:-1]
sp = sp.int()
print(sp)
``````

tensor([[ 0, 2, 19996],
[ 2, 2, 19994],
[ 4, 2, 19992],
…,
[19992, 2, 4],
[19994, 2, 2],
[19996, 2, 0]], dtype=torch.int32)

``````z=torch.stack([torch.cat([torch.zeros(sp[i].item()), torch.ones(sp[i].item()), torch.zeros(sp[i].item()) ]) for i in range(sp.size())])
print(z)
``````

tensor([[1., 1., 0., …, 0., 0., 0.],
[0., 0., 1., …, 0., 0., 0.],
[0., 0., 0., …, 0., 0., 0.],
…,
[0., 0., 0., …, 0., 0., 0.],
[0., 0., 0., …, 1., 0., 0.],
[0., 0., 0., …, 0., 1., 1.]])

``````print(z.size())
``````

torch.Size([9999, 19998])

Total run time was under 1 second on CPU.

Thank you, that’s a clever solution! I really appreciate you taking the time to write a test case.

I noticed that it still uses loops, i.e. `for i in range(sp.size())`. I did a time comparison for 1) @rinkujadhav2013 's solution, vs 2) this. 1) is a bit faster.

In conclusion, it seems like loops are unavoidable for this problem I think loops might be unavoidable cuz your splits are of uneven size. When doing matrix multiplications, dimensions need to align across things being parallelised.

Well, torch_scatter is built to handle splits of uneven size vectorially, provided one can provide the correct arguments to it. In my case, constructing the `index` argument to `scatter_add` needed a loop, that’s why I was looking at alternatives.

Torch `scatter` and `gather` use loops as far as I can tell.
Check the Cuda kernel here: pytorch/ScatterGatherKernel.cu at d7847ed23eb8880c7541319aab8864a45e239994 · pytorch/pytorch · GitHub
or the CPU kernels here:
pytorch/ScatterGatherKernel.cpp at 8f5f15a64b7f2efabfc9137ddc1ffb3390375626 · pytorch/pytorch · GitHub

Oh, that’s interesting. Probably explains why a torch scatter solution I came up with (with loops to construct the `index` matrix) took longer than your suggested `torch.stack` solution.

@souryadey when a for loop is contained within square brackets (i.e. `[k for k in x]`) it becomes a special asynchronous iterator in Python. Asynchronous means each step of k computes separately and doesn’t wait for other iters of k to compute first. So it’s effectively not a “for loop”, though still uses for.

My understanding is that something like `[k for k in x]` is a loop comprehension, which may be faster than a regular loop in certain settings. But it doesn’t give the performance of vector processing. This article has a good discussion.