# Efficient way to concat different size tensors without using a loop

Hi there,

I have a list of different size tensors, and I want to concat some of them using indices without using for-loop.
Here is an example to do it using a loop:

``````item_features = [
torch.LongTensor([0]),
torch.LongTensor([1, 10]),
torch.LongTensor([2, 27, 31]),
torch.LongTensor([3]),
torch.LongTensor([4]),
torch.LongTensor([5, 11]),
]

item_ids = torch.LongTensor([1, 5])

result = torch.cat([
item_features[item_id]
for item_id in item_ids
])
result  # tensor([ 1, 10,  5, 11])
``````

Also I tried to do it by storing the tensors as a sparse csr matrix:

``````item_features = torch.LongTensor([0, 1, 10, 2, 27, 31, 3, 4, 5, 11])
item_indices = torch.LongTensor([0, 1, 3, 6, 7, 8, 10])

item_ids = torch.LongTensor([1, 5])

result = torch.cat([
item_features[item_indices[item_id]: item_indices[item_id + 1]]
for item_id in item_ids
])
result  # tensor([ 1, 10,  5, 11])
``````

But there is a problem of multiple slicing, which is similar to this topic: How to slice multiple interval over tensor.

So, how can I do this in an efficient tensor-like way?

2 Likes

Hi Vasiliy!

If you start with a list of tensors, you will need to loop over that list one
way or another.

Pytorch tensor indexing with a boolean index tensor returns a tensor
whose shape is not that of the index tensor. Instead it returns a
flattened tensor whose length is the number of `True`s in the index
tensor.

let you get around the fact that individual elements of `item_features`
differ in length, making it possible to avoid looping over `item_ids`.

Consider:

``````>>> import torch
>>> print (torch.__version__)
2.1.0
>>>
>>> item_features = torch.LongTensor([0, 1, 10, 2, 27, 31, 3, 4, 5, 11])
>>> item_indices = torch.LongTensor([0, 1, 3, 6, 7, 8, 10])
>>>
>>> item_ids = torch.LongTensor([1, 5])
>>>
>>> # version with loop (list comprehension)
>>> result = torch.cat([
...     item_features[item_indices[item_id]: item_indices[item_id + 1]]
...     for item_id in item_ids
... ])
>>> result  # tensor([ 1, 10,  5, 11])
tensor([ 1, 10,  5, 11])
>>>
>>> # loop-free version using boolean indexing
>>> seq = torch.arange (len (item_features)).unsqueeze (0)
>>> inds = torch.logical_and (seq >= item_indices[:-1, None][item_ids], seq < item_indices[1:,  None][item_ids])
>>> resultB = item_features.expand (inds.shape)[inds]
>>> resultB
tensor([ 1, 10,  5, 11])
>>>
>>> torch.equal (resultB, result)
True
``````

Best.

K. Frank

Hi, KFrank! Thank you for the answer!

If you start with a list of tensors, you will need to loop over that list one
way or another.

Yes, I started with a list, which isnâ€™t good structure, and Iâ€™m thinking about how to store my features in an efficient way.
For the solution with torch.logical_and - it creates a tensor with len(item_features) x len(item_ids) size and this is a new challenge. During my training I have ~10_000_000 features and ~100_000 item_ids (which is batch size) and I canâ€™t create 10_000_000 x 100_000 tensor.

Seems that I found a solution

``````item_features = torch.LongTensor([0, 1, 10, 2, 27, 31, 3, 4, 5, 11])
# item_indices[item_id] is an index where item_id features started in item_features
item_indices = torch.LongTensor([0, 1, 3, 6, 7, 8])
# item_features_sizes[item_id] is a number of item_id features
item_features_sizes = torch.LongTensor([1, 2, 3, 1, 1, 2])

item_ids = torch.LongTensor([1, 5])  # batch

indices = item_indices[item_ids]
sizes = item_features_sizes[item_ids]
prep_indices = indices - torch.cat([
torch.LongTensor([0]),
sizes.cumsum(0)[:-1],
])
feature_indices = (
torch.repeat_interleave(prep_indices, sizes) + torch.arange(sizes.sum())
)
result = item_features[feature_indices]  # tensor([ 1, 10,  5, 11])
``````

But maybe there is a more elegant way to do that.

Hi Vasiliy!

That looks sound.

I really donâ€™t think there is â€“ at least I donâ€™t see one.

At issue is the fact that pytorch tensors donâ€™t support* â€śraggedâ€ť tensors
(that is tensors whose individual rows differ in length). So one way or
another you have to pluck differing numbers of elements out of
`item_features`.

Your solution does that by using `repeat_interleave()`. (My approach
uses boolean indexing.) Thereâ€™s bound to be some â€śinelegantâ€ť processing
in order to implement that step.

*) Pytorch does have sparse tensors, but support for them is pretty limited.
Perhaps you could try that and hope that as you implemented your
algorithm all of the tensor operations you needed were, in fact, supported
for sparse tensors.

Best.

K. Frank

1 Like