Nested list of variable length to a tensor

nikhilweee · August 20, 2019, 8:50am

I might be a bit late to the party, but after realizing that pytorch won’t spoonfeed me anymore, I ended up writing my own function to pad a list of tensors.

The following function takes a nested list of integers and converts it into a padded tensor.

def ints_to_tensor(ints):
    """
    Converts a nested list of integers to a padded tensor.
    """
    if isinstance(ints, torch.Tensor):
        return ints
    if isinstance(ints, list):
        if isinstance(ints[0], int):
            return torch.LongTensor(ints)
        if isinstance(ints[0], torch.Tensor):
            return pad_tensors(ints)
        if isinstance(ints[0], list):
            return ints_to_tensor([ints_to_tensor(inti) for inti in ints])

This relies on another function pad_tensors described below:

def pad_tensors(tensors):
    """
    Takes a list of `N` M-dimensional tensors (M<4) and returns a padded tensor.

    The padded tensor is `M+1` dimensional with size `N, S1, S2, ..., SM`
    where `Si` is the maximum value of dimension `i` amongst all tensors.
    """
    rep = tensors[0]
    padded_dim = []
    for dim in range(rep.dim()):
        max_dim = max([tensor.size(dim) for tensor in tensors])
        padded_dim.append(max_dim)
    padded_dim = [len(tensors)] + padded_dim
    padded_tensor = torch.zeros(padded_dim)
    padded_tensor = padded_tensor.type_as(rep)
    for i, tensor in enumerate(tensors):
        size = list(tensor.size())
        if len(size) == 1:
            padded_tensor[i, :size[0]] = tensor
        elif len(size) == 2:
            padded_tensor[i, :size[0], :size[1]] = tensor
        elif len(size) == 3:
            padded_tensor[i, :size[0], :size[1], :size[2]] = tensor
        else:
            raise ValueError('Padding is supported for upto 3D tensors at max.')
    return padded_tensor

The pad_tensors function only supports tensors of upto 3 dimensions, but that can easily be extended. Using these functions should solve the issue.

As an example, here’s @rustytnt’s input:

In [4]: target = [[[1,2,3], [2,4,5,6]], [[1,2,3], [2,4,5,6], [2,4,6,7,8]]]

In [5]: ints_to_tensor(target)
Out[5]: 
tensor([[[1, 2, 3, 0, 0],
         [2, 4, 5, 6, 0],
         [0, 0, 0, 0, 0]],

        [[1, 2, 3, 0, 0],
         [2, 4, 5, 6, 0],
         [2, 4, 6, 7, 8]]])

And here’s @wangyanda’s input:

 In [6]: target = [[[3,5,4], [8,5], [3]], [[6], [6,4,3,5], [7,5,3]], [[6,5],[2],[2]], [[2],[0],[0]]]

In [7]: ints_to_tensor(target)
Out[7]: 
tensor([[[3, 5, 4, 0],
         [8, 5, 0, 0],
         [3, 0, 0, 0]],

        [[6, 0, 0, 0],
         [6, 4, 3, 5],
         [7, 5, 3, 0]],

        [[6, 5, 0, 0],
         [2, 0, 0, 0],
         [2, 0, 0, 0]],

        [[2, 0, 0, 0],
         [0, 0, 0, 0],
         [0, 0, 0, 0]]])

Marios_Mourelatos · December 22, 2020, 10:06pm

Thank you very much!!! Works like a charm

Brando_Miranda · February 9, 2021, 8:05pm

Not sure if this helps but here is a solution I found on SO:

and here are solutions if the lengths are the same:

percevalw · November 30, 2023, 4:10pm

Hi, I’m definitely late to the party, but I had this exact problem with deeply nested lists (for instance, multiples pdfs of multiple pages of multiple lines of multiple words) and came up with this solution GitHub - aphp/foldedtensor: PyTorch extension for handling deeply nested sequences of variable length, which optimizes the conversion in C (it only handles nested python lists, not lists of tensors atm). Here is a small benchmark :

from foldedtensor import as_folded_tensor
import random
def make_nested_list(arg, *rest, value):
    size = random.randint(*arg) if isinstance(arg, tuple) else arg
    if not rest:
        return [value] * size 
    return [make_nested_list(*rest, value=value) for _ in range(size)]

Variable length nested lists

nested_list = make_nested_list(32, (50, 100), (25, 30), value=1)

%timeit ints_to_tensor(nested_list)
# 20.4 ms ± 241 µs per loop
%timeit as_folded_tensor(nested_list)
# 1.11 ms ± 33.9 µs per loop

The mask can be accessed via as_folded_tensor(nested_list).mask and the tensor can be “refolded” dynamically (flatten the second or third dimension for instance), e.g. tensor.refold(0, 2) to flatten the second dimension.

Same length nested lists (compatible with torch.tensor)

nested_list = make_nested_list(32, 100, 30, value=1)

%timeit torch.tensor(nested_list)
# 11.4 ms ± 435 µs per loop
%timeit ints_to_tensor(nested_list)
# 25.6 ms ± 1.57 ms per loop
%timeit as_folded_tensor(nested_list)
# 1.77 ms ± 37 µs per loop (faster than torch.tensor 🎉)

Simple list

nested_list = make_nested_list(10000, value=1)

%timeit torch.as_tensor(nested_list)
# 1.06 ms ± 42 µs per loop
%timeit ints_to_tensor(nested_list)
# 441 µs ± 14 µs per loop
%timeit as_folded_tensor(nested_list)
# 135 µs ± 2.86 µs per loop