Neural Network initialized with random weights always returns the same output with random inputs

FiorenzoParascandolo · March 17, 2022, 1:50pm

I have a problem with pytorch in Spyder. A randomly initialized Neural Network returns always the same output also for random input tensor. I am currently using local GPU with Spyder. I made sure that the initialization of the weights is random and not all zeros.

Example:

for i in range(20):
  with torch.no_grad():
    y = torch.rand(1, 3, 360, 640)
    y = self.stage_1(y)
    y = self.stage_2(y)
    y = self.stage_3(y)
    y = self.stage_4(y)
    y = self.stage_5(y)
    y = self.stage_6(y)
    y = torch.flatten(y, start_dim=1)
    y = self.fc_linear(y)
    y = self.fc_head(y)
    print(y)

returns always the same value. This is the code of Neural Networks:

class VggStage(nn.Module):
    def __init__(self,
                 input_channels: int,
                 output_channels: int) -> None:
        """
        

        Parameters
        ----------
        input_channels : int
            DESCRIPTION.
        output_channels : int
            DESCRIPTION.

        Returns
        -------
        None
            DESCRIPTION.

        """
        super().__init__()
        
        self.conv1 = nn.Conv2d(in_channels=input_channels,
                               out_channels=output_channels, 
                               kernel_size=(3, 3))
        self.conv2 = nn.Conv2d(in_channels=output_channels,
                               out_channels=output_channels, 
                               kernel_size=(3, 3))
        self.max_pool = nn.MaxPool2d(kernel_size=(2, 2),
                                     stride=(2, 2))
        
    def forward(self,
                x: torch.Tensor) -> torch.Tensor:

        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = self.max_pool(x)
        
        return x

class Vgg(nn.Module):
    def __init__(self) -> None:
        """
        

        Returns
        -------
        None
            DESCRIPTION.

        """
        super().__init__()

        self.stage_1 = VggStage(input_channels=3,
                                output_channels=16)
        self.stage_2 = VggStage(input_channels=16,
                                output_channels=32)
        self.stage_3 = VggStage(input_channels=32,
                                output_channels=64)
        self.stage_4 = VggStage(input_channels=64,
                                output_channels=128)
        self.stage_5 = VggStage(input_channels=128,
                                output_channels=256)
        self.stage_6 = VggStage(input_channels=256,
                                output_channels=512)

        with torch.no_grad():
            x = torch.rand(1, 3, 360, 640)
            x = self.stage_1(x)
            x = self.stage_2(x)
            x = self.stage_3(x)
            x = self.stage_4(x)
            x = self.stage_5(x)
            x = self.stage_6(x)
            x = torch.flatten(x, start_dim=1)
        
        self.fc_linear = nn.Linear(x.shape[1], 256)
        self.fc_head = nn.Linear(256, 1)

ptrblck · March 18, 2022, 6:40am

I guess the penultimate activation might be full of zeros and the output could then be just the .bias of the last linear layer.
If so, you might want to use a better parameter initialization for your model.

FiorenzoParascandolo · March 18, 2022, 11:21am

penultimate activation and last one both are not full of 0s as I have written in the post