Inconsistent results between multiple trainings

I am experiencing issues with a neural network yielding very different performance over multiple training runs. I know that there will always be a certain difference due to training set shuffling and random initialization, though my observed differences are very large (varying in such a range that I sometimes even would underperform with respect to my baselines, other times extremely outperform them). Obviously, I keep all parameters consistent during these training runs. I may also add that I already perform a 1 vs all cross validation setting that should at least to some degree make the results more consistent.

My guess is that my network gets stuck in different local minima during training. I wonder what I could do to change that. Regularization with dropout or/and weight decay seem to make results only worse.

I want to restate again, that I do not aim to reproduce my results exactly each training. Therefore, setting random seeds does not solve my issue.

Any ideas?

You might want to see this thread Non Reproducible result with GPU.
Also, if you are using GPU and then you want set cudnn backend in benchmark mode. Also, set the random seeds.
Add the following lines at the start of your script

import torch
from torch.backends import cudnn
import numpy as np

if args.cuda:
cudnn.benchmark = True

1 Like