How to convert array to tensor?

jaeyung1001 · November 5, 2018, 12:11pm

my data is like below:

X_train =
[1,0,0,0,0,0]
[0,0,0,0,0,1]
[0,1,0,0,0,0]
…

and I want to convert it tensor:
x_train_tensor = Variable(torch.Tensor(X_train.values))

but there is error like this:

TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

how can i fix this error?

lelouedec · November 5, 2018, 12:12pm

use

torch.from_numpy()

jaeyung1001 · November 5, 2018, 12:15pm

the X_train type is Series

and i use:
torch.from_numpy(pd.Series.as_matrix(X_train))

but I met same error

lelouedec · November 5, 2018, 12:17pm

Try

torch.from_numpy(X_train.values)

jaeyung1001 · November 5, 2018, 12:20pm

I met same Error… haha

lelouedec · November 5, 2018, 12:27pm

What do you want to do exactly, X_train.values is giving you a numpy array, so torch.from_numpy should return correctly a Tensor.

jaeyung1001 · November 5, 2018, 12:31pm

I just want to convert my dataframe to tensor

here is return of X_train.values:

array([list([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1]),
       list([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
       list([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1]),
...

jaeyung1001 · November 5, 2018, 12:56pm

Oh… Sorry I suppose my X_train data format is wrong… I resolve it like below:

temp = []
for i in X_train:
    temp.append(i)

and use torch.from_numpy(temp) and it’s work

thank you:)

erikxiong · May 23, 2019, 2:14am

Yes, you should set the values a 2D list like

x = torch.Tensor(list(X_train.values))

Beyazit_Bestami_Yuks · April 4, 2020, 11:15am

Now I’ve this one

Expected object of scalar type Double but got scalar type Float for argument #2 ‘mat2’ in call to _th_mm

ptrblck · April 5, 2020, 3:06am

Make sure to pass the input tensor in the same data type as the layer parameters.
This error is often raised, if you’ve created the input tensor from numpy arrays, since numpy uses float64 as the default type, while PyTorch uses float32.

kendreaditya · April 14, 2020, 4:25pm

I got the same error:

dataset = torch.from_numpy(np.load("dataset.npy", allow_pickle=True))

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

I tried converting the ndarray into a python list like this, but got another error (I think because its a 2D array/list)

dataset = np.load("dataset.npy", allow_pickle=True)
dataset = torch.Tensor(list(dataset))
ValueError: expected sequence of length 64 at dim 2 (got 2)

a object in the dataset has the shape of [[64,64],2]

ptrblck · April 15, 2020, 4:06am

torch.from_numpy expects a single numpy array in the data types mentioned in the error message.
Your dataset.npy seems to contain an array with different objects in it.
I’m not sure if they only have a different shape or also differ in other attributes, but you would have to make sure to create a single array out of your data before passing it to torch.from_numpy.

kendreaditya · April 15, 2020, 8:46pm

I solved the problem in the way you suggested by iteration through the data in single arrays, and thus rebuilding the dataset. Is there any way to convert all arrays within ndarrays to tensors?

ptrblck · April 15, 2020, 11:28pm

Good to hear you solved it!
Had all arrays the same shape inside the ndarray? If so, you could make sure to create a single numpy array, which doesn’t store each element as an object.
Variant shapes will create such an “object array”:

x = np.array([[1., 2.], [3., 4.]])
print(x.dtype)
> float64
y = np.array([[1., 2.], [3.]])
print(y.dtype)
> object

kendreaditya · April 15, 2020, 11:55pm

I did have varying shapes, but I solved the problem by converting both the model data, and the one hot vector to tensors individually, so my code looked like this:

# temp contains NumPy objects
dataset = []
for object in temp:
    dataset.append([torch.Tensor(torch.Tensor(object[0])), torch.Tensor(object[1])])
    # object[0] contains the data; object[1] contains the one-hot vector

Since the numpy array contains objects instead of float64, or any other primitive data types, is there a way in avoiding the creation of another array?

ptrblck · April 16, 2020, 12:55am

I don’t think there is a simple operation for it and your approach looks valid.
Once nested tensors are ready, it might be a simple from_numpy

wahyubram82 · May 17, 2020, 5:52pm

i’m sorry, i’m new in pytorch, try to learn to load the dataset. but have a trouble with it.
i have numpy data in pandas (from speech), how to load it to pytorch?
my data in pandas looks like this:

array([array([[-0.00027978],
       [-0.00027978],
       [-0.00027978],
       ...,
       [ 0.02842325],
       [ 0.02795309],
       [ 0.03359528]], dtype=float32),
       array([[ 0.00029535],
       [ 0.00029535],
       [ 0.00029535],
       ...,
       [ 0.00383177],
       [ 0.00245112],
       [-0.00314868]], dtype=float32)], dtype=object)

it’s a data series…

error when try to make it as tensor dataset, the error is:
train_set = TensorDataset(torch.from_numpy(np.array(train_pd.segmentasi.values)))

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

i think similar with the other question, but when i try to follow, nothing works for me…
already read many question that maybe related, got the clue like this:

np.array((train_pd.segmen.values).tolist())

so, i make:
train_set = TensorDataset(torch.from_numpy(np.array(train_pd.segmentasi.values).tolist())))
still fail, because the array type become an object again.

then,try to set the dtype with:

np.array((train_pd.segmen.values).tolist(),dtype=np.float32)

with command:
train_set = TensorDataset(torch.from_numpy(np.array((train_pd.segmentasi.values).tolist(),dtype=np.float32)))

back error, but this time said:

ValueError: setting an array element with a sequence

please…trully need an advice, the king of novice with numpy and torch.

about the data, i already try to plot it with matplotlib, and it show picture of waveform, so i think the data is fine:

#test the data
segment_data = train_pd['segment'][2]
print(len(segment_data))

# show segmentation
plt.plot(segment_data)
plt.show()

index

ptrblck · May 18, 2020, 7:17am

The issue is that your numpy array has dtype=object, which might come from mixed dtypes or shapes, if I’m not mistaken.
The output also looks as if you are working with nested arrays. Could you try to print the shapes of all “internal” arrays and try to create a single array via e.g. np.stack?
Once you have a single array with a valid dtype, you could use torch.from_numpy.

wahyubram82 · May 19, 2020, 12:38pm

I think…it’s not possible, my data resulting with the same function, a fragmented speech sound…

so…it’s similar with list of list, but in numpy form.
each item, is a numpy float32 type.

all of that process save to pandas dataset.
the problem is, because the ‘main list’ form is numpy object.
the dataframe for each row is like this:

file_wav               transcription              segmendata
path/file1.wav     'one day vacation'     array([[-0.00027978],
                                                 [-0.00027978],
                                                 [-0.00027978],
                                                 [ 0.03359528]], dtype=float32)

so if i just call it only for one row, like:

segment_data = train_pd['segmendata'][4]

the form is like in dataframe above, that’s why I can plot it with matplotlib easily.

but when i call all of it with:

a = train_pd['segmendata'].values

the top bracket or top list or top cover of numpy cover it with dtype=object.

if like you you said, because of the different form item in pandas, the only possible things is: there is an empty item.

well, i must figure it out how to cek it