Same seed (details below), same machine. DP: DataParallel
Single GPU: batch size 32, learning rate 0.01
4 GPUs DP: batch size 32, learning rate 0.0025
Shall these two settings have the same training result?
I think they should have the same result, but the experimental results on CIFAR10 dataset show similar but different training losses and accuracies.
By the way, how about the DistributedDataTraining (DDP) process? 4 GPUs DDP: batch size 32, learning rate 0.01? Will this lead to the same result?
Thank you very much in advance!
random.seed(seed)
np.random.seed(seed)
os.environ['PHTHONHASHSEED'] = str(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministric =True