Dense Network returns all the same output before training

I’m trying to train a simple dense network that takes in 6 inputs, and outputs 3. For some reason, regardless of what I put it, it always outputs the same thing right away. Training doesn’t seem to solve it.

Is there something I’m doing wrong here? I’m sure it’s something simple, but it’s driving me a bit nuts.

import torch.nn as nn
import torch

input_size = 6
hidden_size = 64
n_layers = 24
output_size = 3

layers = [nn.Linear(input_size, hidden_size), nn.ReLU()]
for i in range(n_layers):
    layers.append( nn.Linear(hidden_size, hidden_size) )
    layers.append( nn.ReLU() )
layers.append( nn.Linear(hidden_size, output_size) )

net = nn.Sequential(*layers)

x = torch.normal(0,1,(5,6))
print(net(x))

This outputs something like

tensor([[ 0.0090, -0.0353, -0.0797],
        [ 0.0090, -0.0353, -0.0797],
        [ 0.0090, -0.0353, -0.0797],
        [ 0.0090, -0.0353, -0.0797],
        [ 0.0090, -0.0353, -0.0797]], grad_fn=<AddmmBackward>)

Hi, Are setting a seed somewhere in your code? I’m able to reproduce your issue by adding torch.random.manual_seed(1) at the top of your code and using a learning rate that is too high. The problem during training maybe due to vanishing gradients or an optimizer that is not well initialized.

I’m not setting any seeds in my code. The outputs are different each time I run it, but the strange part is that all 5 of them are the same.

I’ll try lowering my learning rate, that’s a great suggestion. I’m still weirded out that all those are the same BEFORE training. Is that normal behavior? I’ve just never heard of anything like it.

I assumed the problem was a classification one and obtained a probability of approx 1/3 for each class with the same values for each sample. That is to be expected (albeit exactly the same maybe down to the no. of significant digits after decimal) at the start, since the network hasn’t learned anything yet and by pure chance, you get 1/3. If you continue to get the same values a few epochs into training, it may be a problem.

Out of curiosity, could you try manually initializing the linear layer weights using a different distribution? I believe the default is uniform in [-1/sqrt(1/(no.of columns in weight matrix), 1/sqrt(1/(no.of columns in weight matrix)]