Hi all! I am relatively new to pytorch, and wanted some help in writing a dataset class that could possibly use pytorch inbuilt dataset generators like DataFolder or other utilities.
I am working on the ModelNet10 dataset and my directory structure looks like follows:
root directory contains 10 folders (bathtub,bedroom,toilet,…). These contain 2 folders (train and test) each, which further contain around 500 stl files. Right now , my dataset class looks like follows:
class MeshData(Dataset):
def __init__(self,root_dir):
self.classes = set()
self.X = []
self.Y = []
self.final_dataset = []
self.classes_codec = LabelEncoder()
lst_of_classes = os.listdir(root_dir)
lst_of_classes.sort()
self.classes_codec.fit(lst_of_classes)
for x in lst_of_classes:
print(x)
path = root_dir + '/' + x + '/train'
lst_of_objects = os.listdir(path)
for y in lst_of_objects:
file_path = path + '/' + y
dict_para = input_stl(file_path)
if dict_para == False:
print('Input file',y,'has problems')
continue
else:
neigh = dict_para["neigh_index"]
corner = dict_para["corners"]
center = dict_para["centroids"]
normal = dict_para["normals"]
self.X.append(np.concatenate((center,corner,normal,neigh),axis=1))
self.Y.append((self.one_hot_encode(self.classes_codec,[x])))
self.final_dataset = [self.X,self.Y]
def one_hot_encode(self,codec,values):
value_idxs = codec.transform(values)
val,idx = torch.max(torch.eye(len(codec.classes_))[value_idxs],1)
return torch.LongTensor(idx)
def __len__(self):
return len(self.final_dataset)
def __getitem__(self,idx):
return torch.from_numpy(self.X[idx]),self.Y[idx]
Here, i am running brute force over all folders and files, loading them in the format i want and one hot encoding the labels. But this is just too time consuming.
I looked up some documentation and found torchvision.datasets.DatasetFolder to be doing a similar job, but on directory structure as follows:
root/class_x/xxx.ext
root/class_x/xxy.ext
root/class_x/xxz.ext
root/class_y/123.ext
root/class_y/nsdf3.ext
root/class_y/asd932_.ext
Only difference is I have train and test folders inside class_x and class_y and i only want to load training .stl files.
Thanks a lot.