Torch.manual_seed() not working

arnabsinha · September 16, 2024, 4:07pm

Hi everyone,

I am trying to reproduce identical random numbers using torch.manual_seed() but it does not seem to be working. I tried the below method afte referring a few existing forum posts. Please let me know what is wrong.

import torch
from torch import nn, optim
import numpy as np

device = "cuda" if torch.cuda.is_available() else "cpu"

# torch.manual_seed(42)
import random
random.seed(42)
np.random.seed(42)
torch.cuda.manual_seed(42)
torch.backends.cudnn.deterministic = True

model_linear = LinearRegression()
model_linear.to(device=device)
model_linear.state_dict()

The last line gives random values every single time to the weights and biases. Please if anyone can help me out.

Python version: 3.12.4
Torch version: 2.4.1

ptrblck · September 17, 2024, 12:19am

You are initializing the model on the CPU so add torch.manual_seed() which should also seed the device btw.

arnabsinha · September 17, 2024, 1:08am

I uncommented torch.manual_seed(42) but still my model weights and biases are always giving different initial values.

Tony-Y · September 17, 2024, 2:43am

You can check torch.manual_seed with torch.nrand like the MPS bug:

github.com/pytorch/pytorch

MPS: torch.manual_seed not working on metal (mps) for torch.randn

opened 06:11PM - 30 Aug 22 UTC

closed 04:01PM - 03 Jan 23 UTC

Vargol

triaged module: random module: mps

### 🐛 Describe the bug Using torch.manual_seed to set a seed for torch.nrand us…age not not work, seeding with the same number does not bring back consistent results for example rung the noddy script ``` import torch torch.manual_seed(999) print (torch.randn(3, device='cpu')) torch.manual_seed(999) print (torch.randn(3, device='cpu')) torch.manual_seed(999) print (torch.randn(3, device='mps')) torch.manual_seed(999) print (torch.randn(3, device='mps')) torch.manual_seed(999) print (torch.randn(3, device='cpu')) torch.manual_seed(999) print (torch.randn(3, device='cpu')) ``` returns ``` tensor([-0.8379, 0.4564, 0.3481]) tensor([-0.8379, 0.4564, 0.3481]) tensor([ 0.2612, 1.2390, -1.1139], device='mps:0') tensor([-1.4076, 1.3567, -0.4096], device='mps:0') tensor([-0.8379, 0.4564, 0.3481]) tensor([-0.8379, 0.4564, 0.3481]) ``` Note the two MPS lines return different values I'd expect them to be the same, ideally I'd like them to be the same as the CPU values but appreciate that may not be possible. BTW I've cut out the following ou of those results in case you were wondered where it was :-) ``` /Volumes/Sabrent Media/Documents/Source/Python/diffuse/lib/python3.10/site-packages/torch/_tensor_str.py:114: UserWarning: The operator 'aten::masked_select' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.) nonzero_finite_vals = torch.masked_select( ``` ### Versions ``` Collecting environment information... PyTorch version: 1.13.0.dev20220828 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: macOS 12.5.1 (arm64) GCC version: Could not collect Clang version: 13.1.6 (clang-1316.0.21.2.5) CMake version: version 3.22.4 Libc version: N/A Python version: 3.10.4 (main, May 10 2022, 03:52:14) [Clang 13.0.0 (clang-1300.0.29.30)] (64-bit runtime) Python platform: macOS-12.5.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.23.2 [pip3] pytorch-lightning==1.4.2 [pip3] torch==1.13.0.dev20220828 [pip3] torch-fidelity==0.3.0 [pip3] torchaudio==0.13.0.dev20220827 [pip3] torchmetrics==0.6.0 [pip3] torchvision==0.14.0.dev20220827 [conda] Could not collect ``` BTW had to fix the collection script to deal with spaces in path names :-) cc @pbelevich @kulinseth @albanD

ptrblck · September 17, 2024, 4:21am

Could you post the code for LinearRegression?

arnabsinha · September 17, 2024, 11:47am

Sure.

device = "cuda" if torch.cuda.is_available() else "cpu"

class LinearRegression(nn.Module):
    '''
        Class to define the neural network using Linear layers. Importing nn.Module is necessary whenever building any NN
    '''

    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.layer1 = nn.Linear(in_features=1, out_features=1, bias=True, dtype=torch.float32)
        self.layer2 = nn.Linear(in_features=1, out_features=1, bias=True, dtype=torch.float32)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        self.forward1 = self.layer1(x)
        return self.layer2(self.forward1)

Tony-Y · September 17, 2024, 1:12pm

When manual_seed is working:

>>> from torch import Generator, randn
>>> g = Generator(device='cpu')
>>> g.manual_seed(42)
>>> randn(3, generator=g, device='cpu') # initially missing
tensor([0.3367, 0.1288, 0.2345])
>>> g.manual_seed(42)
>>> randn(3, generator=g, device='cpu')
tensor([0.3367, 0.1288, 0.2345])

Can you get a different result on your environment?

arnabsinha · September 17, 2024, 1:29pm

When I run it the first time, I get the same 3 values as you got using this code. When I run randn(3, generator=g, device=‘cpu’) multiple times, I get subsequently different values.

Although I don’t understand how I can fix the different values I am getting in my code.

arnabsinha · September 17, 2024, 1:48pm

I checked again. So torch.manual_seed(42) is giving me the same initial values when I create the model. What I do not understand is if I am changing my device to GPU, why doesn’t torch.cuda.manual_seed(42) work then? What can I do to make it work?

Tony-Y · September 17, 2024, 1:50pm

Could you provide detailed information of your environment?

OS, CPU, etc.
How to install PyTorch: conda, pip, source

arnabsinha · September 17, 2024, 1:55pm

OS: Windows 11 64-bit operating system
CPU: Intel(R) Core™ i5-8300H CPU @ 2.30GHz 2.30 GHz

For installing Pytorch, used the command listed here: “https://pytorch.org/”
Installed Conda from “Download Now | Anaconda” and it came with Python 3.12.4 and pip 24.0

ptrblck · September 17, 2024, 2:22pm

Thanks for the model definition. Your code works as expected and returns the same random values after seeding the code:

python tmp.py
OrderedDict([('layer1.weight', tensor([[0.7645]], device='cuda:0')), ('layer1.bias', tensor([0.8300], device='cuda:0')), ('layer2.weight', tensor([[-0.2343]], device='cuda:0')), ('layer2.bias', tensor([0.9186], device='cuda:0'))])

python tmp.py 
OrderedDict([('layer1.weight', tensor([[0.7645]], device='cuda:0')), ('layer1.bias', tensor([0.8300], device='cuda:0')), ('layer2.weight', tensor([[-0.2343]], device='cuda:0')), ('layer2.bias', tensor([0.9186], device='cuda:0'))])

python tmp.py 
OrderedDict([('layer1.weight', tensor([[0.7645]], device='cuda:0')), ('layer1.bias', tensor([0.8300], device='cuda:0')), ('layer2.weight', tensor([[-0.2343]], device='cuda:0')), ('layer2.bias', tensor([0.9186], device='cuda:0'))])

cat tmp.py 
import torch
import torch.nn as nn

class LinearRegression(nn.Module):
    '''
        Class to define the neural network using Linear layers. Importing nn.Module is necessary whenever building any NN
    '''

    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.layer1 = nn.Linear(in_features=1, out_features=1, bias=True, dtype=torch.float32)
        self.layer2 = nn.Linear(in_features=1, out_features=1, bias=True, dtype=torch.float32)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        self.forward1 = self.layer1(x)
        return self.layer2(self.forward1)

torch.manual_seed(42)

device = "cuda" if torch.cuda.is_available() else "cpu"
model_linear = LinearRegression()
model_linear.to(device=device)
print(model_linear.state_dict())

As previously explained:

The initial random values are created on the CPU before you are moving the model to the GPU, so you need to seed the host, too.

Tony-Y · September 17, 2024, 2:37pm

Could you try to test manual_seed after downgrading PyTorch to 2.3.1?

# CUDA 11.8
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
# CPU Only
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 cpuonly -c pytorch

arnabsinha · September 18, 2024, 1:50pm

Thanks for the help! @ptrblck

QQ: If I wish to initialize the model on GPU directly, what changes can I make to my code?

ptrblck · September 18, 2024, 2:17pm

You can pass the device argument to the layer initialization directly e.g. via: self.layer1 = nn.Linear(..., device=device).