Hi.
Because of the size of dataset, i decide to load the image in the __init__()
function in dataloader class.
def __init__(self, image_file_path, img_transform=None, loader=default_loader):
self.ref = {}
with open(os.path.join(os.path.dirname(image_file_path),"data.csv")) as f:
for line in f:
line = line.strip().split('\t')
folder_name = line[0]
dst = folder_name
pic_name = '-'.join((line[1],line[4]))+'.jpg'
imgdata = base64.b64decode(line[6])
self.ref[os.path.join(dst,pic_name)] = imgdata
self.img_transform = img_transform
self.loader = loader
But when i use num_workers!=0
, i find i have used double of the RAM. It seems that the original dataloader class maintains one copy of the data, and the threads share another copy of the data.
Can i release the space the original dataloader class occupied and just let the threads share the data? Or how can i only use one space of the data instead of two.