I am concerned about my Reproducibility.
Is there a way to use
shuffle=True and keep Reproducibility?
Let’s say I would use:
def set_seeds(seed: int=42): """Sets random sets for torch operations. Args: seed (int, optional): Random seed to set. Defaults to 42. """ # Set the seed for general torch operations torch.manual_seed(seed) # Set the seed for CUDA torch operations (ones that happen on the GPU) torch.cuda.manual_seed(seed)
togehter with the DataLoader:
train_dataloader = DataLoader(dataset=train_data, collate_fn=None, batch_size=None, # how many samples per batch? num_workers=1, # how many subprocesses to use for data loading? (higher = more) shuffle=True, pin_memory=True)
Prob. I will also splitt the Data (train, val).
How do I get the order of images keep being the same?
Since my problem with the DataLoader (Wrong/different image shapes after DataLoader; bug in DL?) it seems that I have to use shuffle.