Hi, I am just starting out on Pytorch. I have encountered a very strange bug in my program that I don’t know whether it is the expected behaviour. My simple test code looks like this

```
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import numpy as np
def init_weights(m):
if type(m) == nn.Linear:
torch.nn.init.xavier_uniform_(m.weight) #works
# torch.nn.init.normal_(m.weight,mean=1,std=1) #doesn't work
# torch.nn.init.uniform_(m.weight) #doesn't work
class Model(nn.Module):
def __init__(self):
super(Model,self).__init__()
self.fc1 = nn.Linear(120,60)
self.fc2 = nn.Linear(60,40)
self.fc3 = nn.Linear(40,1)
def forward(self,x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
return x
model = Model()
model.apply(init_weights) #Disabling this line lead to expected results
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(),lr=0.001, momentum=0.9)
model =model.cuda()
model.train()
printFreq=1
for epochNo in range(20):
optimizer.zero_grad()
targetV = torch.rand(8,1).cuda()+10
inputV = torch.rand(8,120).cuda()
output = model(inputV)
loss = criterion(output,targetV)
loss.backward()
optimizer.step()
if epochNo % printFreq == 0:
print(output)
```

It’s just a toy example with random input. In theory, the network should learn to disregard the input and learn to output the mean of the target, which is 10. With xavier uniform it is working propery, but with uniform or normal initiation, the network output is always zero after the first backprop. Is this normal behavour? Thanks!