I know that my issue may feel similar to others regarding repeatability of results, but I will try to def the statement that it is different. Why? Let me explain:
I use Optuna to optimize hyperparameters. Let’s consider this simplified objective function:
# For repetability
torch.manual_seed(7)
random.seed(7)
np.random.seed(7)
optuna_seed = optuna.samplers.TPESampler(seed=10)
def objective(trial: optuna.trial.Trial):
optimizer_name = trial.suggest_categorical("optimizer_name", fun_train_h,)
lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
ANN_instance = model_func(optimizer_name,lr,train_dataloader,val_dataloader)
accuracy_val=loss_eval(ANN_instance,val_dataloader)
filename=f"/trial_{trial.number}-loss={accuracy_val:.5f}.pth"
torch. Save(model, filename)
return accuracy_val
So the algorithm determine best optimizer type from the list, and the best value of learning rate. It also save all models to file. So I can reload it (I will skip determining the best file, for clarity, it is the same script as optimization):
ANN_instance=torch.load(best_checkpoint_path)
ANN_instance.eval()
accuracy_train=loss_eval(ANN_instance,train_dataloader)*100
accuracy_val=loss_eval(ANN_instance,val_dataloader)*100
accuracy_test=loss_eval(ANN_instance,test_dataloader)*100
It gives me results:
Accuracy over Train: 29.73
Accuracy over Validation: 20.51
Accuracy over Test: 18.81
A bit poor but it is just dirty test, so never mind. I can restart the file multiple times, always the same result, fully deterministic. This is end of File1
Now in File2 I want to repeat the learning process to register few additional metrics (not implemented yet) so I don’t reload checkpoint, I start learning again, making sure I have random seeds fixed. The same hyperparametres, the very same dataloaders.
# For repetability
torch.manual_seed(7)
random.seed(7)
np.random.seed(7)
#Results from previous file
optimizer="AdamW"
lr=0.0005
ANN_instance = model_func(optimizer_name,lr,train_dataloader,val_dataloader)
ANN_instance.eval()
accuracy_train=loss_eval(ANN_instance,train_dataloader)*100
accuracy_val=loss_eval(ANN_instance,val_dataloader)*100
accuracy_test=loss_eval(ANN_instance,test_dataloader)*100
And I get:
Accuracy over Train: 15.05
Accuracy over Validation: 14.8
Accuracy over Test: 14.07
Again I can restart the file as many times as I want, and I will get the same result. But different than in File1
My question is why?
Edit: I think I figured it myself:
def objective(trial: optuna.trial.Trial):
torch.manual_seed(7)
random.seed(7)
np.random.seed(7)
This fixes the issue. But why global seed is not respected and I have to repeat it inside Optuna call?