Hello! I’m working on GCN recently, I build a dataset myself, consisting of about 150,000 datas with a data type of torch_geometric Data. Specifically, I rewrite Data class simply and add another attribute z which is a extra target as my model have two targets:
from torch_geometric.data import Data
class MyData(Data):
def __init__(self,x,y,z,edge_index,pos):
super(MyData, self).__init__(x=x,y=y,edge_index=edge_index,pos=pos)
self.z = z
I save my datas(named as sample) one by one simply using torch.save:
torch.save(sample, 'new_graph_data/{}.pt'.format(str(image_id)+'_'+str(gt_id)))
Therefore, I have got 150,000 pt-files, leading to EXTREMELY slow speed when I try to loading my datas in a traditional way:
all_data_path = os.listdir('F:/CODE/Pytorch/Diplomer/new_graph_data/')
train_ = 5000 # for testing
test_ = 6000
train_data_path = all_data_path[:train_]
test_data_path = all_data_path[train_:test_]
class MyOwnDataset(Dataset):
def __init__(self, root, mode):
super(MyOwnDataset, self).__init__()
self.root = root
self.mode = mode
def get(self, idx):
if self.mode == 'train':
data = torch.load('F:/CODE/Pytorch/Diplomer/new_graph_data/'+train_data_path[idx])
elif self.mode == 'test':
data = torch.load('F:/CODE/Pytorch/Diplomer/new_graph_data/'+test_data_path[idx])
return data
def len(self):
if self.mode == 'train':
return train_
elif self.mode == 'test':
return test_-train_
After researching, I found that the troublesome slow speed results from the loading of huge amount of small files(for I save my pt-file one by one).So I decide to use LMDB to load my files.
txn = env.begin(write=True)
tqdm_iter = tqdm(enumerate(zip(all_file_list, keys)), total=len(all_file_list), leave=False)
for idx, (path, key) in tqdm_iter:
tqdm_iter.set_description('Write {}'.format(key))
key_byte = key.encode('ascii')
data = torch.load(path)
txn.put(key_byte, data)
But it’s known that LMDB requires a data type of bytes, I have no idea how to convert them.
Can u give me some advice?
Or maybe u have other efficient ways to load geometric Data? Any ideas can help me load faster?
Thank u so much!!