im kinda new to pytorch,
so i have a dataframe where one column is the output, and the other column contains values which are of type ndarray, how am i supposed to load it from my pandas dataframe into torch so i can use it in a neural network model or cnn? i have attatched a snapshot of my dataframe.
Thanks in advance
import torch torch_tensor_output = torch.tensor(df['output'].values) torch_tensor_vectors = torch.from_numpy(df['vector'].values)
Hope this would help you.
We use the iterators=True in the read_csv() function of pandas to read the csv loop into memory in batches. (If you don’t use the iterators parameter, the csv file with a large amount of data will be read into the memory. The memory is definitely Not enough) The code is as follows:
# -*- coding: utf-8 -*- import csv import pandas as pd import numpy as np import torch import torch.utils.data as data class FaceLandmarksDataset(data.Dataset): """Face Landmarks dataset.""" def __init__(self, csv_file): """ Args: csv_file (string): Path to the csv file with annotations. root_dir (string): Directory with all the images. transform (callable, optional): Optional transform to be applied on a sample. """ self.landmarks_frame = pd.read_csv(csv_file, iterator=True) def __len__(self): #print len(self.landmarks_frame) #return len(self.landmarks_frame) return 1800000 def __getitem__(self, idx): print idx landmarks = self.landmarks_frame.get_chunk(128).as_matrix().astype('float') # landmarks = self.landmarks_frame.ix[idx, 1:].as_matrix().astype('float') # return landmarks filename = '/media/czn/e04e3ecf-cf63-416c-afd7-6d737e09968a/zhongkeyuan/dataset/CSV/HGG_pandas.csv' dataset = FaceLandmarksDataset(filename) train_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True) for data in train_loader: print data
i tried that, but the second command gives this error,
torch_tensor_vectors = torch.from_numpy(df[‘vector’].values) gives error-
can’t convert a given np.ndarray to a tensor - it has an invalid type. The only supported types are: double, float, int64, int32, and uint8.
how will the program know which column is my label and which column is my matrix?
Could you please share your panda data frame here?
I can have a look.
Are are any missing values, nan or strings in the data ? Make sure of that.