How to use DataLoader in a separate module with num_worker > 0? (multiprocessing on Windows 10)

On Windows 10, I know you are supposed to do DataLoader related tasks like the following:

def main():
    #do multiprocessing with data loader here

if __name__ ==  '__main__':
    main()

However, I decided to move things around and now I do all my training in a separate module that I import into my main.py

This creates a problem with using DataLoader with num_workers > 0 on Windows 10 and my entire script gets executed multiple times (this honestly still makes no sense to me, why normal forking is not possible in Windows?). So, now my flow of execution is:

#main.py
from train import train

#some preparatory code here
train(x)

##########################
##########################
#train.py

train(x):
    #initial data processing
    loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)
    #data post processing

However, the above doesn’t work. I even tried adding the if-condition in main.py:

#main.py
from train import train

#some preparatory code here
if __name__ == '__main__':
    train(x)

##########################
##########################
#train.py

train(x):
    #initial data processing
    loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)
    #data post processing

But the entire script still gets executed multiple times (same problem as before). Then I tried adding the if condition inside train.py

#main.py
from train import train

#some preparatory code here
train(x)

##########################
##########################
#train.py

train(x):
    #initial data processing
    if __name__ == 'utils':
        loader = DataLoader(dataset=_DATASET, batch_size=batch_size, shuffle=False, num_workers=4)

        for i, (image_batch, image_batch_flipped) in enumerate(loader):
            masked_img_array.append(image_batch)
            masked_img_flipped_array.append(image_batch_flipped)

        for frame in data[start_index:end_index]:
            if frame['labels'] is not None:
                unmasked_img_array.append(frame['name'])

#data post processing done outside the if-condition (is this correct)
    fp = open(os.path.join(MASTER_ROOT_DIR, script_args['ROOT_DIR'], "bbox_files", "bboxes" + str(start_index) +
                           "to" + str(end_index)) + ".txt", "rb")
    list_of_dicts = pickle.load(fp)
    all_agents = []
    for dict in list_of_dicts:
        for agent in dict.keys():
            all_agents.append(agent)

    all_agents_unique = list(set(all_agents))
    print("TOTAL UNIQUE AGENTS " + str(len(all_agents_unique)))
    trainFile.write("TOTAL UNIQUE AGENTS " + str(len(all_agents_unique)) + "\n")
    print(len(masked_img_array))

    return all_agents_unique, list_of_dicts, masked_img_array, masked_img_flipped_array, unmasked_img_array

But now the dataloader doesn’t load my data into the arrays masked_img_array, masked_img_flipped_array and unmasked_img_array

So, should I just go back to the original method or is there a way around it where you can do multiprocessing in a separate module?

Edit: Actually, after trying the recent code (I was using the wrong name in the if condition) I again face the same problem as before where the whole script gets executed multiple times.