Weight initalization only similar when using the same GPU

I am running an experiment where the classifier of my network grows. As a result, I should initialize a new classifier and then put the whole model on the GPU again. I was printing the weight of this classifier after initialization and I noticed that only when I am running the experiment on the exact same GPU, for example, GPU:0, do I get the same weights. I have set all the seeds so I assumed the weights should always be the same!

This is how I am setting the seeds:

torch.manual_seed(args.rnd_seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    # Add more seeding? maybe it will be necessary for the bayesian model
    torch.cuda.manual_seed_all(args.rnd_seed)
    torch.cuda.manual_seed(args.rnd_seed)
    np.random.seed(args.rnd_seed)
    random.seed(args.rnd_seed)

and determinism is also on:

torch.use_deterministic_algorithms(True, warn_only=True)

Whet I observe is not aligned with my expectations. So, I appreciate any guidance.

That’s not the case as there is no guarantee the same algorithms will be used between different hardware generations etc. (on the GPU and CPU).

Thank you for your response. Could you elaborate more on the
I am using the same type of GPU and CPU. I only change the GPU number, I guess CPU is the same. What causes the difference in generation?
After making the following changes, I get the same results for the classifier:

After initializing the new classifier, I set the seed:

  torch.manual_seed(self.kwargs['rnd_seed'])
  small = True
  nn.init.kaiming_normal_(self.model.fc.weight)
  if small:
         self.model.fc.weight.data.mul_(0.001)
  if self.model.fc.bias is not None:
         nn.init.constant_(self.model.fc.bias, 0)
                

This way, I get the same weights, even when I change the GPU, for example from GPU:0 to GPU:1.

Sorry, I don’t fully understand your use case as I assumed you were using different GPU families. Could you post a minimal and executable code snippet showing the presumably unexpected behavior?

I am using this github repo. . The part of code that I am referring to is in /methods/finetune.py, in the function def before_task, line 121 self.model.fc = nn.Linear(in_features, new_out_features). This is a continual learning project. In such a setting, we train on a stream of datasets, meaning after training task_0, we use the weights of the feature extractor, expand the classifier head (the above line) and train on task_1. I have adapted the repo code by adding the required reproducibility lines to the beginning of main.py,

torch.manual_seed(args.rnd_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
# Add more seeding? maybe it will be necessary for the bayesian model
torch.cuda.manual_seed_all(args.rnd_seed)
torch.cuda.manual_seed(args.rnd_seed)
np.random.seed(args.rnd_seed)
random.seed(args.rnd_seed)

and the following line to the beginning of each module I’m suing,

torch.use_deterministic_algorithms(True, warn_only=True)

I have a machine with 4 GPUs, all from the same family (NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)). After printing the weights of the updated classifier, I noticed that they are only identical when the GPU number is the same. When I changed GPU:0 to :1, the initialized weights were different. so, I read a bit and found out that maybe this is related to how random generation works in python. Since we create and intialize the head multiple times, setting the seeds only at the beginning is not enough for the classifier. We need to set the seed each time the classier is created and this should be before it is specifically initialized (I am not sure abut this part though).

Yes, you would need to make sure the same order of calls is performed in the pseudo-random number generator.
Let me know once you were able to create a minimal code snippet by reducing the code base and are still stuck.