Feed list of tensors to torch.utils.data.TensorDataset()

Blupon · November 14, 2022, 10:49am

Hi ! I would like to use torch.utils.data.TensorDataset() with a variable number of input tensors stored in a list and face an AttributeError: 'list' object has no attribute 'size' error. I understand the error but do not know what’s the best way to circumvent it.

What we usually do: torch.utils.data.TensorDataset(x_train, y_train)

What I want to do: torch.utils.data.TensorDataset([x_train1, y_train1, x_train2,...]) the input list of tensors being of variable length (the latter is built with successive .append() calls).

The point of using a list of input tensors is that each tensor can carry an input of different size (but with a shared number of inputs in each input tensor, i.e. a common first dimension size, thus coherent with the _getitem() method of TensorDataset()). The number of input list lengths fed to TensorDataset() varies from experiment to experiment, it can not be hardcoded.

The most direct alternative seems to call TensorDataset() once for each tensor of my list of inputs, and manually go through all the associated dataloaders (stored in a list ?) but the first one with a for loop within the first dataloader batch iteration loop. Maybe the proper way of handling this is a collate_fn as suggested here for images with variable sizes. I would like to avoid this if possible, since TensorDataset() is already able to handle an arbitrary number of input tensors. Apologies if I missed the solution here or on SO & many thanks.

ptrblck · November 14, 2022, 5:48pm

You might want to write a custom Dataset (deriving from TensorDataset if needed) as described e.g. here.