Form a 3D tensor from 2D using uneven padding with stride size

Given a 2D tensor A of size m x r, and a list of steps steps = [m_1, ..., m_k] that partition m (ie, sum(steps)=m).
Define max_m_i = \max_i m_i and s_i = sum(steps[:i]).
I need to create a 3D tensor of size k x max_m_i x r by padding every slice of A w.r.t. steps, ie, A[s_i:s_i+steps[i], :] of size m_{i+1} x r pad to the size max_m_i x r.

Example,

import torch

m = 6; r = 2
A = torch.arange(0, 12).reshape(6, 2)
print("A = ", A)
steps = [3, 2, 1]
print(f"{steps = }")
max_m_i = max(steps)
pad_A = []
s_i = 0
for i in range(len(steps)):
    pad_A += [torch.cat((A[s_i:s_i + steps[i]], torch.zeros((max_m_i - steps[i], r))), axis=0)]
    s_i += steps[i]
pad_A = torch.stack(pad_A)
print("pad_A = ", pad_A)

output

A =  tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11]])
steps = [3, 2, 1]
pad_A =  tensor([[[ 0.,  1.],
         [ 2.,  3.],
         [ 4.,  5.]],

        [[ 6.,  7.],
         [ 8.,  9.],
         [ 0.,  0.]],

        [[10., 11.],
         [ 0.,  0.],
         [ 0.,  0.]]])

Creating a mask tensor might give you a performance benefit, but you might need to profile both approaches:

out = torch.zeros(len(steps), max(steps), A.size(1)).long()
mask = torch.tensor(False).expand(len(steps), max(steps), A.size(1)).contiguous()
for s, m in zip(steps, mask):
    m[:s] = True

out[mask] = A.view(-1)
print(out)
# tensor([[[ 0,  1],
#          [ 2,  3],
#          [ 4,  5]],

#         [[ 6,  7],
#          [ 8,  9],
#          [ 0,  0]],

#         [[10, 11],
#          [ 0,  0],
#          [ 0,  0]]])

I guess there might be a more efficient way, but I didn’t find one quickly.