I want to train a CNN with my own dataset, and I follow the guidance to define my own data set use torchvision.utils.data.Dataset. I create a txt file which contains all the paths to all my training data, and in the getitem function it opens an image file and returns the image, but I find this way too slow, maybe because in each epoch it will open this image again and read it. So should I read all the images at a time(e.g. define a global variable trainX which shaped (trainingdata_num, feature, w ,h) )? or maybe there are some other clever way?
below is my current codes
the variable self.data contains all the paths of the training data
‘’’
class MyDataset(Data.Dataset):
def init(self,txt_path,img_path,transform=None,have_label=True):
self.img_path=img_path
self.have_label=have_label
self.transform=transform
data=[]
with open(txt_path) as f:
for line in f:
if self.have_label == True:
data.append((line.strip(),int(line.split("_")[0])))
else :
data.append(line.strip())
self.data=data
def getitem(self,index):
if self.have_label == True:
name,label=self.data[index]
else :
name=self.data[index]
img=Image.open(os.path.join(self.img_path,name)).convert(‘RGB’)
img=img.resize((128,128))
if self.transform is not None:
img=self.transform(img)
if self.have_label == True:
return img,label
else :
return img
def len(self):
return len(self.data)
‘’’