Is there any other way to do this operation faster? Is it possible to avoid the for loop?

nowyouseeme · June 17, 2020, 1:54pm

I have a list of tensors of different shape. This each tensor in this list is used to select a subset of rows of a 2D tensor W. Consider the following example :

l1 = [torch.tensor([0, 1]),
 torch.tensor([0, 1]),
 torch.tensor([0, 2, 3, 5, 6]),
 torch.tensor([0, 2, 3, 5, 6]),
 torch.tensor([0, 2, 3, 5]),
 torch.tensor([0, 2, 3]),
 torch.tensor([0, 2, 4]),
 torch.tensor([0, 2, 4])]

l1
>>> [tensor([0, 1]),
 tensor([0, 1]),
 tensor([0, 2, 3, 5, 6]),
 tensor([0, 2, 3, 5, 6]),
 tensor([0, 2, 3, 5]),
 tensor([0, 2, 3]),
 tensor([0, 2, 4]),
 tensor([0, 2, 4])]

I have a global 2D weight tensor of shape [7,300]. For each tensor in the list above, I want to select rows corresponding to indices in those tensor. So, I’d use first tensor tensor([0, 1]) to select the 1st and 2nd row of 2D tensor W. Similarly, I’d use third tensor (tensor([0, 2, 3, 5, 6])) to select 1st, 3rd, 4th, 6th and 7th rows of weight matrix W. I do this using a for loop as follows.

W = torch.randn((7, 300), dtype=torch.float)
W
>>> tensor([[-0.3830, -0.5174, -1.8962,  ...,  0.9110,  1.4701,  0.4347],
        [ 0.2234, -0.0493, -1.1374,  ..., -0.5010,  0.0462,  0.6938],
        [-0.1562, -0.1772, -0.9825,  ..., -0.4177, -0.2688,  0.1964],
        ...,
        [ 1.9325, -0.5216,  1.0538,  ..., -1.1179, -0.1078,  1.0612],
        [-0.9618,  0.6065, -1.7601,  ...,  0.1251, -0.7518,  0.6883],
        [-1.0139,  0.2360,  1.4346,  ...,  0.0604,  0.4170, -1.3022]])

W.shape
>>> torch.Size([7, 300])

selected_rows = []
for tensor in l1:
    selected_rows.append(W[tensor])

[row.shape for row in selected_rows]
>>> [torch.Size([2, 300]),
 torch.Size([2, 300]),
 torch.Size([5, 300]),
 torch.Size([5, 300]),
 torch.Size([4, 300]),
 torch.Size([3, 300]),
 torch.Size([3, 300]),
 torch.Size([3, 300])]

As we can see, we get a list of tensors which are the selected rows of matrix W according to indices provided in list l1. However, I want to avoid the for loop here. We can use torch.stack, which will convert the list l1 into a torch tensor and then we can use that tensor to index the weight matrix W. However, I can not use it since shapes of the tensors in the list are different. I don’t know if there is another way to do this operation in pytorch without using for loop. If there is any, please let me know. Thanks in advane

If all the tensors had the same shape in the list like following, we could easily avoid the for loop :

l1 = torch.tensor([[0, 1], [0, 1]])
l1
>>> tensor([[0, 1],
        [0, 1]])

selected_rows = W[l1]
selected_rows.shape 
>>> torch.Size([2, 2, 300])

albanD · June 17, 2020, 2:34pm

Hi,

What is the expected output? A list of Tensors?
If your Tensors in the list don’t all have the same size, you won’t be able to pack them into a single Tensor.
What might help though is to concatenate all the indices into a 1D Tensor. Then you can do all the indexing in one go.
If you still want the list as output, you can get it by using narrow(0, curr_idx, nb_el) for each element.

nowyouseeme · June 17, 2020, 2:49pm

@albanD. Thanks for a quick response. That’s what I did, and looks like it’s working. There is no any for loop required initially, but I’m getting a different for loop at the end, and I do not know how to avoid it. I exactly did what you just said.

My approach :

I first concatenated all tensors into a 1D tensor, then used this concatenated tensor to index the matrix W.

concatenated = torch.cat(l1, dim=0)
concatenated.shape
>>>
torch.Size([27])

splits = tuple(torch.tensor([i.shape for i in l1]))

selected_rows = W[concatenated]

splits is defined to later split the concatenated result.
(Note that there is one for loop while defining splits, but I thought I must include it, it might not be that costly). After this point, I want to multiply all these rows with a 300 dimensional vector h, and I do it as follows.

dot_prods = torch.matmul(selected_rows, h)

# Now its time to split this single tensor according to the original list l1

splitted_dot_prods = torch.split(dot_prods, splits)
splitted_dot_prods 
>>> 
(tensor([ -8.1478, -21.9214]),
 tensor([-8.1478, 21.9214]),
 tensor([  8.1478,  33.1918,  -7.2199, -17.2290,   6.6762]),
 tensor([  8.1478,  33.1918,  -7.2199, -17.2290,  -6.6762]),
 tensor([ 8.1478, 33.1918, -7.2199, 17.2290]),
 tensor([ 8.1478, 33.1918,  7.2199]),
 tensor([  8.1478, -33.1918,  24.0853]),
 tensor([  8.1478, -33.1918, -24.0853]))

As you can see, now I’m getting a tuple of tensors, and for each tensor in this tuple, I want to get the sum of the tensor (torch.sum). Again a second for loop is coming on the way.

Is my approach any more efficient than the one I proposed earlier? Can you now identify any way to avoid any of the for loop?

albanD · June 17, 2020, 2:59pm

This one will definitely be more efficient as the for loop is only reading an attribute on the Tensor and not indexing a Tensor.
Also the split is equivalent to what I meant with narrow and will be fast.

If you want to sum all these Tensors you will have to add a new for-loop I’m afraid.
If you work on CPU it shouldn’t be too bad though.

Note that we are currently working on adding NestedTensors to pytorch: https://github.com/pytorch/nestedtensor
This will allow you to handle different-sized Tensors as a single Tensor.

nowyouseeme · June 17, 2020, 3:02pm

@albanD Thanks for your inputs. I’m going with my approach, using two less costly for loops. If you feel that the topic title should be different, please let me know. A good relevant topic might be easier to search for others facing the similar issue. Thanks