On Windows 10, I know you are supposed to do DataLoader related tasks like the following:
def main():
#do multiprocessing with data loader here
if __name__ == '__main__':
main()
However, I decided to move things around and now I do all my training in a separate module that I import into my main.py
This creates a problem with using DataLoader with num_workers > 0 on Windows 10 and my entire script gets executed multiple times (this honestly still makes no sense to me, why normal forking is not possible in Windows?). So, now my flow of execution is:
#main.py
from train import train
#some preparatory code here
train(x)
##########################
##########################
#train.py
train(x):
#initial data processing
loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)
#data post processing
However, the above doesn’t work. I even tried adding the if-condition in main.py:
#main.py
from train import train
#some preparatory code here
if __name__ == '__main__':
train(x)
##########################
##########################
#train.py
train(x):
#initial data processing
loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)
#data post processing
But the entire script still gets executed multiple times (same problem as before). Then I tried adding the if condition inside train.py
#main.py
from train import train
#some preparatory code here
train(x)
##########################
##########################
#train.py
train(x):
#initial data processing
if __name__ == 'utils':
loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)
for i, (image_batch, image_batch_flipped) in enumerate(loader):
masked_img_array.append(image_batch)
masked_img_flipped_array.append(image_batch_flipped)
for frame in data[start_index:end_index]:
if frame['labels'] is not None:
unmasked_img_array.append(frame['name'])
#data post processing done outside the if-condition (is this correct)
fp = open(os.path.join(MASTER_ROOT_DIR, script_args['ROOT_DIR'], "bbox_files", "bboxes" + str(start_index) +
"to" + str(end_index)) + ".txt", "rb")
list_of_dicts = pickle.load(fp)
all_agents = []
for dict in list_of_dicts:
for agent in dict.keys():
all_agents.append(agent)
all_agents_unique = list(set(all_agents))
print("TOTAL UNIQUE AGENTS " + str(len(all_agents_unique)))
trainFile.write("TOTAL UNIQUE AGENTS " + str(len(all_agents_unique)) + "\n")
print(len(masked_img_array))
return all_agents_unique, list_of_dicts, masked_img_array, masked_img_flipped_array, unmasked_img_array
But now the dataloader doesn’t load my data into the arrays masked_img_array, masked_img_flipped_array and unmasked_img_array
So, should I just go back to the original method or is there a way around it where you can do multiprocessing in a separate module?
Edit: Actually, after trying the recent code (I was using the wrong name in the if condition) I again face the same problem as before where the whole script gets executed multiple times.