Is it possible to fix the seed when the program is running on slurm cluster


I add the line to fix the seed for random operations at the beginning of my code, like this:

import model # definition of my model, containing randomly initializing the parameters
import components
import optim


def train():

The import is before the line of torch.manual_seed(123), but the model is created after that line.

I run the same code on a 8-gpu machine for two times, and the result is quite near, with difference to be less than 0.1, but after moving my code to some slurm cluster, the difference is enlarged up to 0.2-0.8. The difference of the two running is that slurm allocates my code to different nodes to execute each time I submit a task. But I dont have proof whether this has anything to do with the difference.

Does torch.manual_seed(123) have different behaviors on different machines? How could I fix the random behavior please?