How to multiply array elements based on repeated elements in a second array

drelyea_blackberry · November 23, 2020, 5:42am

I have what seems like a simple problem, but I cannot find answers for it anywhere. If I have two arrays, I want to multiply/combine the elements of one of them based on whether or not the elements in the other are sequential or repeated. For instance,

array_with_repeated_elements = tensor([1, 2, 0, 0, 2, 2, 2, 1, 0, 0])
# could just as well be [a, b, c, c, d, d, d, e, f, f]
array_to_be_multiplied = tensor([1., 3., 5., 2., 2., 7., 2., 4., 3., 4.])

desired_output = tensor([1, 3, 10, 28, 4, 12])

In numpy, this can be done easily:

first_index_of_each_sequence = np.hstack([0,np.where(array_with_repeated_elements[1:] != array_with_repeated_elements[0:-1])[0]+1])
# this creates array([0, 1, 2, 4, 7, 8])
desired_output = 1-np.multiply.reduceat(array_to_be_multiplied, first_index_of_each_sequence)

I can’t seem to do this in pytorch. The best guess I have is this monster:

first_index_of_each_sequence = torch.cat([torch.LongTensor((0,)), torch.where(array_with_repeated_elementst[1:] != array_with_repeated_elementst[0:-1])[0]+1, torch.LongTensor((len(array_with_repeated_elements),))])
# makes tensor([0, 1, 2, 4, 7, 8, 10])
size_of_each_sequence = first_index_of_each_sequence[1:] - first_index_of_each_sequence[0:-1]
# makes tensor([1, 1, 2, 3, 1, 2])
full_length_array_of_ascending_index_elements = torch.arange(len(size_of_each_sequence)).repeat_interleave(size_of_each_sequence)
desired_output_base = torch.zeros(len(size_of_each_sequence))
# makes tensor([0, 1, 2, 2, 3, 3, 3, 4, 5, 5])
desired_output_base.index_add_(0, full_length_array_of_ascending_index_elements, torch.log(array_to_be_multipliedt))
# does what I want in log space, but ew if I ever have a zero
desired_output = torch.exp(desired_output_base)
# duh

Does anyone have any ideas on how to do this nicely? The easy numpy implementation suggests I’ve missed something in pytorch…

drelyea_blackberry · November 24, 2020, 6:53pm

That answer wouldn’t work for sequences with zeros or near-zeros in them (also really large numbers, but they’d render this whole thing a bit moot). I need a solution which won’t have numerical errors. That’s why I built the monstrosity up above.

In numpy, this is trivial. Is it not trivial in pytorch? Again, seems like I’ve missed some function.