Best way to convert a list to a tensor?

let a=[1,2,3], then i let b=torch.Tensor(a) , my pycharm’s background become yellow like that
%E6%97%A0%E6%A0%87%E9%A2%98
is there exist a elegent way to convert a list to a tensor? or is my ide’s fault?

10 Likes

Convert list to tensor using this

a = [1, 2, 3]
b = torch.FloatTensor(a)

Your method should also work but you should cast your datatype to float so you can use it in a neural net

9 Likes

Hi,
First of all, PyCharm or most of IDEs cannot really analysis libraries like PyTorch which has C++ backend and Python frontend so it is normal to get warning or missing errors but your codes works fine.

But about your question:
When you are on GPU, torch.Tensor() will convert your data type to Float. Actually, torch.Tensor and torch.FloatTensor both do same thing.
But I think better way is using torch.tensor() (note the case of ‘t’ character). It converts your data to tensor but retains data type which is crucial in some methods. You may know that PyTorch and numpy are switchable to each other so if your array is int, your tensor should be int too unless you explicitly change type.

But on top of all these, torch.tensor is convention because you can define following variables:
device, dtype, requires_grad, etc.

Note: using torch.tensor() allocates new memory to copy the data of tensor. So if you want to avoid copying, use torch.as_tensor(numpy_ndarray).

Bests
Nik

24 Likes

1000 thx! You reply really help me a lot!

You are welcome mate!

Something about @zimmer550 answer that you need to convert to float to use your tensors in NN, is a rule of thumb, so in some cases like methods available in functional package etc, you need Long data type etc. So best approach is to retain the data type as it is and change it explicitly when you to enable you debug much faster when data type inconsistency exists.

In PyCharm, Ctrl+Q (I use my config from 2017 so it may be changed.) shows the documentation within your editor and always I recommend you to read it fully. There are many notes that can save you a lot of memory or runtime computation by only using a argument or triggering a function etc.

I am doing something very similar but I have a (nested) list of tensors. The simplest example I have is the following:

import torch

# trying to convert a list of tensors to a torch.tensor

x = torch.randn(3, 1)
xs = [x, x]
# xs = torch.tensor(xs)
xs = torch.as_tensor(xs)

but I get the following error:

x = torch.randn(3, 1)
xs = [x, x]
# xs = torch.tensor(xs)
xs = torch.as_tensor(xs)
Traceback (most recent call last):
  File "/Users/brando/anaconda3/envs/automl-meta-learning/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-30-f846b7bfae21>", line 4, in <module>
    xs = torch.as_tensor(xs)
ValueError: only one element tensors can be converted to Python scalars

any ideas what is going on? Btw, both give the same error.


I guess the following works but I am unsure what is wrong with this solution:

# %%

import torch

# trying to convert a list of tensors to a torch.tensor

x = torch.randn(3)
xs = [x.numpy(), x.numpy()]
# xs = torch.tensor(xs)
xs = torch.as_tensor(xs)

print(xs)
print(xs.size())

# %%

import torch

# trying to convert a list of tensors to a torch.tensor

x = torch.randn(3)
xs = [x.numpy(), x.numpy(), x.numpy()]
xs = [xs, xs]
# xs = torch.tensor(xs)
xs = torch.as_tensor(xs)

print(xs)
print(xs.size())

output:

import torch
# trying to convert a list of tensors to a torch.tensor
x = torch.randn(3)
xs = [x.numpy(), x.numpy(), x.numpy()]
xs = [xs, xs]
# xs = torch.tensor(xs)
xs = torch.as_tensor(xs)
print(xs)
print(xs.size())
tensor([[[0.3423, 1.6793, 0.0863],
         [0.3423, 1.6793, 0.0863],
         [0.3423, 1.6793, 0.0863]],
        [[0.3423, 1.6793, 0.0863],
         [0.3423, 1.6793, 0.0863],
         [0.3423, 1.6793, 0.0863]]])
torch.Size([2, 3, 3])

Hi,

I think torch.tensor — PyTorch 1.7.0 documentation and torch.as_tensor — PyTorch 1.7.0 documentation have explained the difference clearly but in summary, torch.tensor always copies the data but torch.as_tensor tries to avoid that! In both cases, they don’t accept sequence of tensors.

The more intuitive way is stacking in a given dimension which you can find here: How to turn a list of tensor to tensor? - PyTorch Forums

The problem with your approach is that you convert your tensors to numpy, then you will lose grads and break computational graph but stacking preserves it.

Actually you have posted an answer similar that issue too! :wink: How to turn a list of tensor to tensor? - #10 by Brando_Miranda

2 Likes

hahaha, yea I see I didn’t make a very solid memory/understanding when I wrote that last year (or it’s been to long since?). Perhaps I can finally sort this out in my head, though in my defence, there does seem to be a lot of discussions surrounding this topic (which isn’t helping to digest this):

it seems that the best pytorchthoning solution comes from either knowing torch.cat or torch.stack. In my use case I generate tensors and conceptually need to nest them in lists and eventually convert that to a final tensor (e.g. of size [d1, d2, d3]). I think the easiest solution to my problem append things to a list and then give it to torch.stack to form the new tensor then append that to a new list and then convert that to a tensor by again using torch.stack recursively.

For a non recursive example I think this works…will update with a better example in a bit:

# %%

import torch

# stack vs cat

# cat "extends" a list in the given dimension e.g. adds more rows or columns

x = torch.randn(2, 3)
print(f'{x.size()}')

# add more rows (thus increasing the dimensionality of the column space to 2 -> 6)
xnew_from_cat = torch.cat((x, x, x), 0)
print(f'{xnew_from_cat.size()}')

# add more columns (thus increasing the dimensionality of the row space to 3 -> 9)
xnew_from_cat = torch.cat((x, x, x), 1)
print(f'{xnew_from_cat.size()}')

print()

# stack serves the same role as append in lists. i.e. it doesn't change the original
# vector space but instead adds a new index to the new tensor, so you retain the ability
# get the original tensor you added to the list by indexing in the new dimension
xnew_from_stack = torch.stack((x, x, x, x), 0)
print(f'{xnew_from_stack.size()}')

xnew_from_stack = torch.stack((x, x, x, x), 1)
print(f'{xnew_from_stack.size()}')

xnew_from_stack = torch.stack((x, x, x, x), 2)
print(f'{xnew_from_stack.size()}')

# default appends at the from
xnew_from_stack = torch.stack((x, x, x, x))
print(f'{xnew_from_stack.size()}')

print('I like to think of xnew_from_stack as a \"tensor list\" that you can pop from the front')

print()

lst = []
print(f'{x.size()}')
for i in range(10):
    x += i  # say we do something with x at iteration i
    lst.append(x)
# lstt = torch.stack([x for _ in range(10)])
lstt = torch.stack(lst)
print(lstt.size())

print()

Update: With nested list of the same size

def tensorify(lst):
    """
    List must be nested list of tensors (with no varying lengths within a dimension).
    Nested list of nested lengths [D1, D2, ... DN] -> tensor([D1, D2, ..., DN)

    :return: nested list D
    """
    # base case, if the current list is not nested anymore, make it into tensor
    if type(lst[0]) != list:
        if type(lst) == torch.Tensor:
            return lst
        elif type(lst[0]) == torch.Tensor:
            return torch.stack(lst, dim=0)
        else:  # if the elements of lst are floats or something like that
            return torch.tensor(lst)
    current_dimension_i = len(lst)
    for d_i in range(current_dimension_i):
        tensor = tensorify(lst[d_i])
        lst[d_i] = tensor
    # end of loop lst[d_i] = tensor([D_i, ... D_0])
    tensor_lst = torch.stack(lst, dim=0)
    return tensor_lst

here is a few unit tests (I didn’t write more tests but it worked with my real code so I trust it’s fine. Feel free to help me by adding more tests if you want):


def test_tensorify():
    t = [1, 2, 3]
    print(tensorify(t).size())
    tt = [t, t, t]
    print(tensorify(tt))
    ttt = [tt, tt, tt]
    print(tensorify(ttt))

if __name__ == '__main__':
    test_tensorify()
    print('Done\a')

When nested list are variable length

related for variable length:

I didn’t read those very carefully but I assume they must be padding somehow (probably need to calculate the largest length/dimension to padd is my guess and then do some sort of recursion like I did above).

4 Likes

If someone is looking into the performance aspects of this, I’ve done a small experiment. In my case, I needed to convert a list of scalar tensors into a single tensor.

import torch
torch.__version__ # 1.10.2
x = [torch.randn(1) for _ in range(10000)]
torch.cat(x).shape, torch.stack(x).shape # torch.Size([10000]), torch.Size([10000, 1])

%timeit torch.cat(x) # 1.5 ms ± 476 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit torch.cat(x).reshape(-1,1) # 1.95 ms ± 534 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit torch.stack(x) # 5.36 ms ± 643 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

My conclusion is that even if you want to have the additional dimension of torch.stack, using torch.cat and then reshape is better.

1 Like

Works, but in many other cases torch.FloatTensor(a) produces
TypeError: expected TensorOptions(dtype=float, device=cpu ...
Therefore, often Nikronic’s answer is better: torch.tensor()
Sometimes: torch.from_numpy