How to convert array to tensor?

ok, I already check, everything is ok…none of it has empty data…

i don’t know if this right or not…
but I think, i can solve it, just because it can transform to torch form.

what I do is:
i create a new dataframe, with the values from pandas columns.

target = pd.DataFrame(train_pd['segmendata'])

type(target)
>> pandas.core.frame.DataFrame

len(target)
>> 1487

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1).values.astype(np.float32)))
>> tensor([], size=(1487, 0))

like I said, i’m the champ of the novice, is this correct? i’m affraid got error in the next step.

well, then i try several code, it’s works too…but make me, more confuse…

another command that works is:

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1).values)))

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1)))

the ouput is similar:
tensor([], size=(1487, 0))

ok, many tutorial, not solving my problem. so i solve this by not hurry transform pandas numpy to pytorch tensor, because this is the main problem that not solved.

EDIT:
reason the fail converting to torch is because the shape of each numpy data in paneldata have different size. not because of another reason.
because my data is data series, so it imposible to change the shape, because it’s from the speech data.

what I do is, I transform each row numpy data on pandas to torch and back it save to pandas. so i have a column data that have torch form. I do it with:

def convert_np_to_torch(numpy_data_from_one_file):
    return torch.Tensor(np.array(numpy_data_from_one_file, dtype=(np.float32)))

def convert_labeltorch(data_in_str_list):
    le = preprocessing.LabelEncoder()
    targets = le.fit_transform(data_in_str_list)
    return torch.as_tensor(targets)

def add_torchfeatures(df):
    train_pd['torch_gfcc'] = train_pd['gfcc'].apply(lambda x : convert_np_to_torch(x))
    train_pd['torch_mfcc'] = train_pd['mfcc'].apply(lambda x : convert_np_to_torch(x))
    train_pd['torch_segment'] = train_pd['segmentdata'].apply(lambda x : convert_np_to_torch(x))

def add_torchlabel(df):
    train_pd['torch_words'] = train_pd['words'].apply(lambda x : convert_labeltorch(x))

the i run:


%%timeit
add_torchfeatures(train_pd)

247 ms ± 86.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and thetest file:

%%timeit
add_torchfeatures(test_pd)

167 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

EDIT:
like I explained before, my data is a data series (1 features, 1 sequence)

after create a torch dataframe in pandas, here the step to “input it” to pytorch dataset and dataloader.

  1. change the paneldata to list format.
train_segment_torchlist = train_pd['torch_segment'].to_list()
test_segment_torchlist = test_pd['torch_segment'].to_list()

train_mfcc_torchlist = train_pd['torch_mfcc'].to_list()
test_mfcc_torchlist = test_pd['torch_mfcc'].to_list()

train_gfcc_torchlist = train_pd['torch_gfcc'].to_list()
test_gfcc_torchlist = test_pd['torch_gfcc'].to_list()
  1. convert each of it to torch
train_segment_torch = torch.nn.utils.rnn.pad_sequence(train_segment_torchlist, batch_first=True, padding_value=0)
test_segment_torch = torch.nn.utils.rnn.pad_sequence(test_segment_torchlist, batch_first=True, padding_value=0)

train_mfcc_torch = torch.nn.utils.rnn.pad_sequence(train_mfcc_torchlist, batch_first=True, padding_value=0)
test_mfcc_torch = torch.nn.utils.rnn.pad_sequence(test_mfcc_torchlist, batch_first=True, padding_value=0)

train_gfcc_torch = torch.nn.utils.rnn.pad_sequence(train_gfcc_torchlist, batch_first=True, padding_value=0)
test_gfcc_torch = torch.nn.utils.rnn.pad_sequence(test_gfcc_torchlist, batch_first=True, padding_value=0)
  2.a. if you want to combine it first, you can do it by `torch.cat` command
  1. input it to pytorch dataset
train_segment_dataset = TensorDataset(train_segment_torch)
test_segment_dataset = TensorDataset(test_segment_torch)

train_mfcc_dataset = TensorDataset(train_mfcc_torch)
test_mfcc_dataset = TensorDataset(test_mfcc_torch)

train_gfcc_dataset = TensorDataset(train_gfcc_torch)
test_gfcc_dataset = TensorDataset(test_gfcc_torch)
  1. load it with pytorch dataloader:
train_segment_loader = DataLoader(train_segment_dataset,batch_size=12)
test_segment_loader = DataLoader(test_segment_dataset,batch_size=12)

train_mfcc_loader = DataLoader(train_mfcc_dataset,batch_size=12)
test_mfcc_loader = DataLoader(test_mfcc_dataset,batch_size=12)

train_gfcc_loader = DataLoader(train_gfcc_dataset,batch_size=12)
test_gfcc_loader = DataLoader(test_gfcc_dataset,batch_size=12)

hope it helps for the others…

How i Convert temp = [[1.], [1., -1.]] to torch.tensor() ??

You cannot convert these nested lists with varying shapes to tensors currently.
Once nested tensors are in a stable state, this should be possible.

1 Like

Thank you so much. This was exactly my problem. The raised error is not at all informative. It just says: Memory Error …

Could you post the complete error message, as it doesn’t sound like the expected error you should be seeing?

That was the error message. Literally, Memory Error …

Now it is fixed. The problem was what you just said. I had to make sure after converting numpy arrays to sensors, they wouldn’t become float64 type. I was doing all this on a Jupyter notebook, mind you.