I am concerned about my Reproducibility.
Is there a way to use seed
s and shuffle=True
and keep Reproducibility?
Let’s say I would use:
def set_seeds(seed: int=42):
"""Sets random sets for torch operations.
Args:
seed (int, optional): Random seed to set. Defaults to 42.
"""
# Set the seed for general torch operations
torch.manual_seed(seed)
# Set the seed for CUDA torch operations (ones that happen on the GPU)
torch.cuda.manual_seed(seed)
togehter with the DataLoader:
train_dataloader = DataLoader(dataset=train_data,
collate_fn=None,
batch_size=None, # how many samples per batch?
num_workers=1, # how many subprocesses to use for data loading? (higher = more)
shuffle=True,
pin_memory=True)
Prob. I will also splitt the Data (train, val).
How do I get the order of images keep being the same?
Since my problem with the DataLoader (Wrong/different image shapes after DataLoader; bug in DL?) it seems that I have to use shuffle.