How to convert array to tensor?

wahyubram82 · May 19, 2020, 1:56pm

ok, I already check, everything is ok…none of it has empty data…

i don’t know if this right or not…
but I think, i can solve it, just because it can transform to torch form.

what I do is:
i create a new dataframe, with the values from pandas columns.

target = pd.DataFrame(train_pd['segmendata'])

type(target)
>> pandas.core.frame.DataFrame

len(target)
>> 1487

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1).values.astype(np.float32)))
>> tensor([], size=(1487, 0))

like I said, i’m the champ of the novice, is this correct? i’m affraid got error in the next step.

well, then i try several code, it’s works too…but make me, more confuse…

another command that works is:

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1).values)))

pt = torch.Tensor(np.array(target.drop('segmendata', axis=1)))

the ouput is similar:
tensor([], size=(1487, 0))

wahyubram82 · May 19, 2020, 6:18pm

ok, many tutorial, not solving my problem. so i solve this by not hurry transform pandas numpy to pytorch tensor, because this is the main problem that not solved.

EDIT:
reason the fail converting to torch is because the shape of each numpy data in paneldata have different size. not because of another reason.
because my data is data series, so it imposible to change the shape, because it’s from the speech data.

what I do is, I transform each row numpy data on pandas to torch and back it save to pandas. so i have a column data that have torch form. I do it with:

def convert_np_to_torch(numpy_data_from_one_file):
    return torch.Tensor(np.array(numpy_data_from_one_file, dtype=(np.float32)))

def convert_labeltorch(data_in_str_list):
    le = preprocessing.LabelEncoder()
    targets = le.fit_transform(data_in_str_list)
    return torch.as_tensor(targets)

def add_torchfeatures(df):
    train_pd['torch_gfcc'] = train_pd['gfcc'].apply(lambda x : convert_np_to_torch(x))
    train_pd['torch_mfcc'] = train_pd['mfcc'].apply(lambda x : convert_np_to_torch(x))
    train_pd['torch_segment'] = train_pd['segmentdata'].apply(lambda x : convert_np_to_torch(x))

def add_torchlabel(df):
    train_pd['torch_words'] = train_pd['words'].apply(lambda x : convert_labeltorch(x))

the i run:


%%timeit
add_torchfeatures(train_pd)

247 ms ± 86.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and thetest file:

%%timeit
add_torchfeatures(test_pd)

167 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

EDIT:
like I explained before, my data is a data series (1 features, 1 sequence)

after create a torch dataframe in pandas, here the step to “input it” to pytorch dataset and dataloader.

change the paneldata to list format.

train_segment_torchlist = train_pd['torch_segment'].to_list()
test_segment_torchlist = test_pd['torch_segment'].to_list()

train_mfcc_torchlist = train_pd['torch_mfcc'].to_list()
test_mfcc_torchlist = test_pd['torch_mfcc'].to_list()

train_gfcc_torchlist = train_pd['torch_gfcc'].to_list()
test_gfcc_torchlist = test_pd['torch_gfcc'].to_list()

convert each of it to torch

train_segment_torch = torch.nn.utils.rnn.pad_sequence(train_segment_torchlist, batch_first=True, padding_value=0)
test_segment_torch = torch.nn.utils.rnn.pad_sequence(test_segment_torchlist, batch_first=True, padding_value=0)

train_mfcc_torch = torch.nn.utils.rnn.pad_sequence(train_mfcc_torchlist, batch_first=True, padding_value=0)
test_mfcc_torch = torch.nn.utils.rnn.pad_sequence(test_mfcc_torchlist, batch_first=True, padding_value=0)

train_gfcc_torch = torch.nn.utils.rnn.pad_sequence(train_gfcc_torchlist, batch_first=True, padding_value=0)
test_gfcc_torch = torch.nn.utils.rnn.pad_sequence(test_gfcc_torchlist, batch_first=True, padding_value=0)

  2.a. if you want to combine it first, you can do it by `torch.cat` command

input it to pytorch dataset

train_segment_dataset = TensorDataset(train_segment_torch)
test_segment_dataset = TensorDataset(test_segment_torch)

train_mfcc_dataset = TensorDataset(train_mfcc_torch)
test_mfcc_dataset = TensorDataset(test_mfcc_torch)

train_gfcc_dataset = TensorDataset(train_gfcc_torch)
test_gfcc_dataset = TensorDataset(test_gfcc_torch)

load it with pytorch dataloader:

train_segment_loader = DataLoader(train_segment_dataset,batch_size=12)
test_segment_loader = DataLoader(test_segment_dataset,batch_size=12)

train_mfcc_loader = DataLoader(train_mfcc_dataset,batch_size=12)
test_mfcc_loader = DataLoader(test_mfcc_dataset,batch_size=12)

train_gfcc_loader = DataLoader(train_gfcc_dataset,batch_size=12)
test_gfcc_loader = DataLoader(test_gfcc_dataset,batch_size=12)

hope it helps for the others…

BinhMinhs10 · July 13, 2020, 11:01am

How i Convert temp = [[1.], [1., -1.]] to torch.tensor() ??

ptrblck · July 14, 2020, 9:21am

You cannot convert these nested lists with varying shapes to tensors currently.
Once nested tensors are in a stable state, this should be possible.

mehran2020 · July 17, 2020, 5:42pm

Thank you so much. This was exactly my problem. The raised error is not at all informative. It just says: Memory Error …

ptrblck · July 18, 2020, 2:33am

Could you post the complete error message, as it doesn’t sound like the expected error you should be seeing?

mehran2020 · July 18, 2020, 9:28am

That was the error message. Literally, Memory Error …

Now it is fixed. The problem was what you just said. I had to make sure after converting numpy arrays to sensors, they wouldn’t become float64 type. I was doing all this on a Jupyter notebook, mind you.