Turning a one-hot to padded indices tensor

I have a binary input to my model that I sample from some Bernoulli distribution:

>>> batch_size = 3
>>> weights = torch.tensor([0.1111, 0.2669, 0.5122, 0.2045, 0.0224]).repeat((batch_size, 1))
>>> batch = torch.bernoulli(weights)
>>> batch
tensor([[1., 0., 0., 1., 1.],
        [0., 1., 0., 0., 0.],
        [0., 1., 1., 0., 0.]])

I want to get a padded sequence of embeddings for every sample in my batch so I can run it through a transformer encoder. For that, I first need to turn batch into a tensor of indices, then pad it, and finally run it through an embedding layer.
The closest I’ve come is to use tensor.nonzero() but I don’t know how to pad it and reshape it back to batch_size, max_seq_len.

Desired output:

>>> func(batch, padding_value=-1)
tensor([[ 0.,  3.,  4.],
        [ 1., -1., -1.],
        [ 1.,  2., -1.]])

In the meantime, I’ve solved this by iterating over the batch. Still curious if there exists a more elegant way.

>>>def to_indices(batch, padding_value=-1):
   ...:     indices = []
   ...:     for sample in batch:
   ...:         indices.append(sample.nonzero(as_tuple=True)[0])
   ...:     return nn.utils.rnn.pad_sequence(indices, batch_first=True, padding_value=padding_value)
>>> batch
tensor([[1., 0., 0., 1., 1.],
        [0., 1., 0., 0., 0.],
        [0., 1., 1., 0., 0.]])
>>> to_indices(batch)
tensor([[ 0,  3,  4],
        [ 1, -1, -1],
        [ 1,  2, -1]])