Hi, I am using the following model for training the network:

```
class FemnistNet(nn.Module):
def __init__(self):
super(FemnistNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2) ##output shape (batch, 32, 28, 28)
self.pool1 = nn.MaxPool2d(2, stride=2, ) ## output shape (batch, 32, 14, 14)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2) ##output shape (batch, 64, 14, 14)
self.pool2 = nn.MaxPool2d(2, stride=2) ## output shape (batch, 64, 7, 7)
self.fc1 = nn.Linear(3136, 2048)
self.fc2 = nn.Linear(2048 ,62)
def forward(self, x):
x = x.view(-1, 1, 28, 28)
x = self.conv1(x)
x = th.nn.functional.relu(x)
x = self.pool1(x)
x=self.conv2(x)
x = th.nn.functional.relu(x)
x = self.pool2(x)
x = x.flatten(start_dim=1)
x = self.fc1(x)
l1_activations = th.nn.functional.relu(x)
x = self.fc2(l1_activations)
x = x.softmax()
return x, l1_activations
```

Default initializations of `weights`

is `kaiming_uniform`

. It trains the model well. When I initializes the `weights`

using `xavier`

as `th.nn.init.xavier_uniform_(self.fc1.weight)`

then model parameters become `nan`

for `dense/linear`

layers. What is the impact of `weights`

initialization distribution? Why weights become `nan`

in `th.nn.init.xavier_uniform_(self.fc1.weight)`

?

Different distributions work well in `Tensorflow`

. I donâ€™t experience `NaN`

s in `Tensorflow`

.