Network output is always zero after first backprop

Hi, I am just starting out on Pytorch. I have encountered a very strange bug in my program that I don’t know whether it is the expected behaviour. My simple test code looks like this

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import numpy as np


def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight) #works
        # torch.nn.init.normal_(m.weight,mean=1,std=1)  #doesn't work
        # torch.nn.init.uniform_(m.weight) #doesn't work

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()

        self.fc1 = nn.Linear(120,60)
        self.fc2 = nn.Linear(60,40)
        self.fc3 = nn.Linear(40,1)

    def forward(self,x):

        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        return x

model = Model()
model.apply(init_weights) #Disabling this line lead to expected results
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(),lr=0.001, momentum=0.9)

model =model.cuda()
model.train()

printFreq=1

for epochNo in range(20):
    optimizer.zero_grad()

    targetV = torch.rand(8,1).cuda()+10
    inputV = torch.rand(8,120).cuda()

    output = model(inputV)

    loss = criterion(output,targetV)

    loss.backward()
    optimizer.step()

    if epochNo % printFreq == 0:
        print(output)

It’s just a toy example with random input. In theory, the network should learn to disregard the input and learn to output the mean of the target, which is 10. With xavier uniform it is working propery, but with uniform or normal initiation, the network output is always zero after the first backprop. Is this normal behavour? Thanks!