How to convert array to tensor?

my data is like below:

X_train =
[1,0,0,0,0,0]
[0,0,0,0,0,1]
[0,1,0,0,0,0]

and I want to convert it tensor:
x_train_tensor = Variable(torch.Tensor(X_train.values))

but there is error like this:

TypeError: can’t convert np.ndarray of type numpy.object_. The only supported types are: double, float, float16, int64, int32, and uint8.

how can i fix this error?

2 Likes

use

torch.from_numpy()

5 Likes

the X_train type is Series

and i use:
torch.from_numpy(pd.Series.as_matrix(X_train))

but I met same error

2 Likes

Try

torch.from_numpy(X_train.values)

I met same Error… haha

What do you want to do exactly, X_train.values is giving you a numpy array, so torch.from_numpy should return correctly a Tensor.

1 Like

I just want to convert my dataframe to tensor

here is return of X_train.values:

array([list([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 4, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1]),
       list([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
       list([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1]),
...

Oh… Sorry I suppose my X_train data format is wrong… I resolve it like below:

temp = []
for i in X_train:
    temp.append(i)

and use torch.from_numpy(temp) and it’s work

thank you:)

1 Like

Yes, you should set the values a 2D list like

x = torch.Tensor(list(X_train.values))

Now I’ve this one :frowning:

Expected object of scalar type Double but got scalar type Float for argument #2 ‘mat2’ in call to _th_mm

Make sure to pass the input tensor in the same data type as the layer parameters.
This error is often raised, if you’ve created the input tensor from numpy arrays, since numpy uses float64 as the default type, while PyTorch uses float32.

3 Likes

I got the same error:

dataset = torch.from_numpy(np.load("dataset.npy", allow_pickle=True))

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

I tried converting the ndarray into a python list like this, but got another error (I think because its a 2D array/list)

dataset = np.load("dataset.npy", allow_pickle=True)
dataset = torch.Tensor(list(dataset))
ValueError: expected sequence of length 64 at dim 2 (got 2)

a object in the dataset has the shape of [[64,64],2]

torch.from_numpy expects a single numpy array in the data types mentioned in the error message.
Your dataset.npy seems to contain an array with different objects in it.
I’m not sure if they only have a different shape or also differ in other attributes, but you would have to make sure to create a single array out of your data before passing it to torch.from_numpy.

I solved the problem in the way you suggested by iteration through the data in single arrays, and thus rebuilding the dataset. Is there any way to convert all arrays within ndarrays to tensors?

Good to hear you solved it!
Had all arrays the same shape inside the ndarray? If so, you could make sure to create a single numpy array, which doesn’t store each element as an object.
Variant shapes will create such an “object array”:

x = np.array([[1., 2.], [3., 4.]])
print(x.dtype)
> float64
y = np.array([[1., 2.], [3.]])
print(y.dtype)
> object

I did have varying shapes, but I solved the problem by converting both the model data, and the one hot vector to tensors individually, so my code looked like this:

# temp contains NumPy objects
dataset = []
for object in temp:
    dataset.append([torch.Tensor(torch.Tensor(object[0])), torch.Tensor(object[1])])
    # object[0] contains the data; object[1] contains the one-hot vector

Since the numpy array contains objects instead of float64, or any other primitive data types, is there a way in avoiding the creation of another array?

I don’t think there is a simple operation for it and your approach looks valid.
Once nested tensors are ready, it might be a simple from_numpy :wink:

1 Like

i’m sorry, i’m new in pytorch, try to learn to load the dataset. but have a trouble with it.
i have numpy data in pandas (from speech), how to load it to pytorch?
my data in pandas looks like this:

array([array([[-0.00027978],
       [-0.00027978],
       [-0.00027978],
       ...,
       [ 0.02842325],
       [ 0.02795309],
       [ 0.03359528]], dtype=float32),
       array([[ 0.00029535],
       [ 0.00029535],
       [ 0.00029535],
       ...,
       [ 0.00383177],
       [ 0.00245112],
       [-0.00314868]], dtype=float32)], dtype=object)

it’s a data series…

error when try to make it as tensor dataset, the error is:
train_set = TensorDataset(torch.from_numpy(np.array(train_pd.segmentasi.values)))

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

i think similar with the other question, but when i try to follow, nothing works for me…
already read many question that maybe related, got the clue like this:

np.array((train_pd.segmen.values).tolist())

so, i make:
train_set = TensorDataset(torch.from_numpy(np.array(train_pd.segmentasi.values).tolist())))
still fail, because the array type become an object again.

then,try to set the dtype with:

np.array((train_pd.segmen.values).tolist(),dtype=np.float32)

with command:
train_set = TensorDataset(torch.from_numpy(np.array((train_pd.segmentasi.values).tolist(),dtype=np.float32)))

back error, but this time said:

ValueError: setting an array element with a sequence

please…trully need an advice, the king of novice with numpy and torch.

about the data, i already try to plot it with matplotlib, and it show picture of waveform, so i think the data is fine:

#test the data
segment_data = train_pd['segment'][2]
print(len(segment_data))

# show segmentation
plt.plot(segment_data)
plt.show()
156500

index

The issue is that your numpy array has dtype=object, which might come from mixed dtypes or shapes, if I’m not mistaken.
The output also looks as if you are working with nested arrays. Could you try to print the shapes of all “internal” arrays and try to create a single array via e.g. np.stack?
Once you have a single array with a valid dtype, you could use torch.from_numpy.

I think…it’s not possible, my data resulting with the same function, a fragmented speech sound…

so…it’s similar with list of list, but in numpy form.
each item, is a numpy float32 type.

all of that process save to pandas dataset.
the problem is, because the ‘main list’ form is numpy object.
the dataframe for each row is like this:

file_wav               transcription              segmendata
path/file1.wav     'one day vacation'     array([[-0.00027978],
                                                 [-0.00027978],
                                                 [-0.00027978],
                                                 [ 0.03359528]], dtype=float32)

so if i just call it only for one row, like:

segment_data = train_pd['segmendata'][4]

the form is like in dataframe above, that’s why I can plot it with matplotlib easily.

but when i call all of it with:

a = train_pd['segmendata'].values

the top bracket or top list or top cover of numpy cover it with dtype=object.

if like you you said, because of the different form item in pandas, the only possible things is: there is an empty item.

well, i must figure it out how to cek it