How to scatter with variable length of index

I want to use create multiple-hot vector.

I tried to use scatter_ function of Tensor.
It works fine for the fixed length of indices.

batch_size = 4
dim = 7
idx = torch.LongTensor([[0,1],[2,3],[0,4],[0,5]])
hot_vec = hot_v = torch.zeros(batch_size, dim)
hot_vec.scatter_(1, idx, 1.0)

result

    1     1     0     0     0     0     0
    0     0     1     1     0     0     0
    1     0     0     0     1     0     0
    1     0     0     0     0     1     0

But my data does not have the same number of index
for
idx_v = torch.LongTensor( [[0], [2,3], [0,1,4], [0,5]] )
I want the hot vector like the below.

    1     0     0     0     0     0     0
    0     0     1     1     0     0     0
    1     1     0     0     1     0     0
    1     0     0     0     0     1     0

What can I use to make the vector ?

6 Likes

Not really an nice solution, but if you fill the longtensor data that is shorter than the max index length with repeats, it will work because it will make them the same size. Example from yours modified:

idx = torch.LongTensor( [[0,0,0], [2,2,3], [0,1,4], [0,0,5]])

It will work, but is pretty hacky. I’m sure you could write something to fill the short indices with repeats until it reaches the same length as the longest index array. Maybe someone who knows better can explain if there’s a more elegant workaround.

5 Likes

Just stumbled across this old post when looking for something else, but variable length scatter is basically creating a sparse tensor:

>>> idx = torch.LongTensor([[0, 0], [1, 2], [1 ,3], [2, 0], [2, 1], [2 ,4], [3, 0], [3 ,5]])
>>> val = torch.ones(len(idx))
>>> hot_vec = torch.sparse.FloatTensor(idx.t(), val, torch.Size([4, 7])
>>> hot_vec.to_dense()
tensor([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  1.,  0.,  0.,  0.],
        [ 1.,  1.,  0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  1.,  0.]])
4 Likes