Load dataframe in torch

(Adithya Samavedhi) #1

Hi guys,
im kinda new to pytorch,
so i have a dataframe where one column is the output, and the other column contains values which are of type ndarray, how am i supposed to load it from my pandas dataframe into torch so i can use it in a neural network model or cnn? i have attatched a snapshot of my dataframe.
Thanks in advance

(Pranavan Theivendiram) #2

Try this.

import torch

torch_tensor_output = torch.tensor(df['output'].values)
torch_tensor_vectors = torch.from_numpy(df['vector'].values)

Hope this would help you.


We use the iterators=True in the read_csv() function of pandas to read the csv loop into memory in batches. (If you don’t use the iterators parameter, the csv file with a large amount of data will be read into the memory. The memory is definitely Not enough) The code is as follows:

# -*- coding: utf-8 -*-
import csv
import pandas as pd
import numpy as np
import torch
import torch.utils.data as data
class FaceLandmarksDataset(data.Dataset):
	"""Face Landmarks dataset."""
	def __init__(self, csv_file):
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
		self.landmarks_frame = pd.read_csv(csv_file, iterator=True)
	def __len__(self):
		#print len(self.landmarks_frame)
		#return len(self.landmarks_frame)
		return 1800000
	def __getitem__(self, idx):
		print idx
		landmarks = self.landmarks_frame.get_chunk(128).as_matrix().astype('float')
		# landmarks = self.landmarks_frame.ix[idx, 1:].as_matrix().astype('float')
		return landmarks
filename = '/media/czn/e04e3ecf-cf63-416c-afd7-6d737e09968a/zhongkeyuan/dataset/CSV/HGG_pandas.csv'
dataset = FaceLandmarksDataset(filename)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True)
for data in train_loader:
	print data

(Adithya Samavedhi) #4

i tried that, but the second command gives this error,
torch_tensor_vectors = torch.from_numpy(df[‘vector’].values) gives error-
can’t convert a given np.ndarray to a tensor - it has an invalid type. The only supported types are: double, float, int64, int32, and uint8.

(Adithya Samavedhi) #5

how will the program know which column is my label and which column is my matrix?

(Pranavan Theivendiram) #6

Hi Adithya,

Could you please share your panda data frame here?

I can have a look.


Are are any missing values, nan or strings in the data ? Make sure of that.