Hi all,
I am looking for an idiomatic way to make batched multihot vectors (multihot being like a onehot but several can be hot). For example, we might have
[[1. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 1. 0.]
[0. 1. 0. 0. 0. 1.]]
as a sequence of three 2hot vectors generated from the sequence [a b c] where a maps to classes 0 and 2, b maps to 2 and 4, etc.
Suppose I have function that maps elements in a sequence to a tuple of their class indices and I have a set of batched sequences of the same length. Is there an idiomatic way to transform the sequences to their batched, multihot representations? More specifically, I have a tensor of size (Batch Size X Sequence Length)
and I want to build one that is (Batch Size X Sequence Length X Classes)
where the feature dimension can be multihot based on a mapping from sequence elements to tuples of classes.
There is a onehot version like:
def one_hot_encode(arr, n_labels):
# Initialize the the encoded array
one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.float32)
# Fill the appropriate elements with ones
one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1.
# Finally reshape it to get back to the original array
one_hot = one_hot.reshape((*arr.shape, n_labels))
return one_hot
and I have had luck summing two, batched, onehot encoding arrays together, but this strikes me as inelegant.
Some notes:

This is for input, not for a multilabel classification output.

I realize it would be possible to just do a onehot encoding based on the Cartesian product of class pairs, but in my actual use case the number of classes and possible classes per element is such that combinatorial explosion makes onehot encoding them infeasible.
Thanks in advance.