ok, many tutorial, not solving my problem. so i solve this by not hurry transform pandas numpy to pytorch tensor, because this is the main problem that not solved.
EDIT:
reason the fail converting to torch is because the shape of each numpy data in paneldata have different size. not because of another reason.
because my data is data series, so it imposible to change the shape, because it’s from the speech data.
what I do is, I transform each row numpy data on pandas to torch and back it save to pandas. so i have a column data that have torch form. I do it with:
def convert_np_to_torch(numpy_data_from_one_file):
return torch.Tensor(np.array(numpy_data_from_one_file, dtype=(np.float32)))
def convert_labeltorch(data_in_str_list):
le = preprocessing.LabelEncoder()
targets = le.fit_transform(data_in_str_list)
return torch.as_tensor(targets)
def add_torchfeatures(df):
train_pd['torch_gfcc'] = train_pd['gfcc'].apply(lambda x : convert_np_to_torch(x))
train_pd['torch_mfcc'] = train_pd['mfcc'].apply(lambda x : convert_np_to_torch(x))
train_pd['torch_segment'] = train_pd['segmentdata'].apply(lambda x : convert_np_to_torch(x))
def add_torchlabel(df):
train_pd['torch_words'] = train_pd['words'].apply(lambda x : convert_labeltorch(x))
the i run:
%%timeit
add_torchfeatures(train_pd)
247 ms ± 86.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and thetest file:
%%timeit
add_torchfeatures(test_pd)
167 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
EDIT:
like I explained before, my data is a data series (1 features, 1 sequence)
after create a torch dataframe in pandas, here the step to “input it” to pytorch dataset and dataloader.
- change the paneldata to list format.
train_segment_torchlist = train_pd['torch_segment'].to_list()
test_segment_torchlist = test_pd['torch_segment'].to_list()
train_mfcc_torchlist = train_pd['torch_mfcc'].to_list()
test_mfcc_torchlist = test_pd['torch_mfcc'].to_list()
train_gfcc_torchlist = train_pd['torch_gfcc'].to_list()
test_gfcc_torchlist = test_pd['torch_gfcc'].to_list()
- convert each of it to torch
train_segment_torch = torch.nn.utils.rnn.pad_sequence(train_segment_torchlist, batch_first=True, padding_value=0)
test_segment_torch = torch.nn.utils.rnn.pad_sequence(test_segment_torchlist, batch_first=True, padding_value=0)
train_mfcc_torch = torch.nn.utils.rnn.pad_sequence(train_mfcc_torchlist, batch_first=True, padding_value=0)
test_mfcc_torch = torch.nn.utils.rnn.pad_sequence(test_mfcc_torchlist, batch_first=True, padding_value=0)
train_gfcc_torch = torch.nn.utils.rnn.pad_sequence(train_gfcc_torchlist, batch_first=True, padding_value=0)
test_gfcc_torch = torch.nn.utils.rnn.pad_sequence(test_gfcc_torchlist, batch_first=True, padding_value=0)
2.a. if you want to combine it first, you can do it by `torch.cat` command
- input it to pytorch dataset
train_segment_dataset = TensorDataset(train_segment_torch)
test_segment_dataset = TensorDataset(test_segment_torch)
train_mfcc_dataset = TensorDataset(train_mfcc_torch)
test_mfcc_dataset = TensorDataset(test_mfcc_torch)
train_gfcc_dataset = TensorDataset(train_gfcc_torch)
test_gfcc_dataset = TensorDataset(test_gfcc_torch)
- load it with pytorch dataloader:
train_segment_loader = DataLoader(train_segment_dataset,batch_size=12)
test_segment_loader = DataLoader(test_segment_dataset,batch_size=12)
train_mfcc_loader = DataLoader(train_mfcc_dataset,batch_size=12)
test_mfcc_loader = DataLoader(test_mfcc_dataset,batch_size=12)
train_gfcc_loader = DataLoader(train_gfcc_dataset,batch_size=12)
test_gfcc_loader = DataLoader(test_gfcc_dataset,batch_size=12)
hope it helps for the others…