Hello,
I want to make a small tool that can do data-set pre-splitting before the train happen.
What I did:
- split the original testloader to three sub-testloader return as a dataloader list–by using torch.utils.data.random_split,
- save all testloaders to three *.pt file to disk – by using torch.save
- reload three *.pt files to new testloaders from disk --by using torch.load
Basically, I am following the suggestion of here for save/load the dataloader
When I compare post_step1 testloader1 vs. post_step3 reload_testloader1 – I sample out the 1st batch of testloader and plot the image, then I find they are not the same.
Initially, I suspect if there are some random/shuffle flag impacts, but given at end of step1, all three testloader should be already confirmed, I couldn’t figure out where could be the problem.
Is there any tricky about the torch.save/torch.load for the testloader object? the saving/load is quite simple.
def save_dl(dataloader_obj:DataLoader, file_name: str) -> None:
"""save the dataloader to pt file"""
torch.save(dataloader_obj, file_name)
def load_dl(file_name: str) -> DataLoader:
"""load the dataloader from pt file"""
return torch.load(file_name)