Hi, I am using the following model for training the network:
class FemnistNet(nn.Module):
def __init__(self):
super(FemnistNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2) ##output shape (batch, 32, 28, 28)
self.pool1 = nn.MaxPool2d(2, stride=2, ) ## output shape (batch, 32, 14, 14)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2) ##output shape (batch, 64, 14, 14)
self.pool2 = nn.MaxPool2d(2, stride=2) ## output shape (batch, 64, 7, 7)
self.fc1 = nn.Linear(3136, 2048)
self.fc2 = nn.Linear(2048 ,62)
def forward(self, x):
x = x.view(-1, 1, 28, 28)
x = self.conv1(x)
x = th.nn.functional.relu(x)
x = self.pool1(x)
x=self.conv2(x)
x = th.nn.functional.relu(x)
x = self.pool2(x)
x = x.flatten(start_dim=1)
x = self.fc1(x)
l1_activations = th.nn.functional.relu(x)
x = self.fc2(l1_activations)
x = x.softmax()
return x, l1_activations
Default initializations of weights
is kaiming_uniform
. It trains the model well. When I initializes the weights
using xavier
as th.nn.init.xavier_uniform_(self.fc1.weight)
then model parameters become nan
for dense/linear
layers. What is the impact of weights
initialization distribution? Why weights become nan
in th.nn.init.xavier_uniform_(self.fc1.weight)
?
Different distributions work well in Tensorflow
. I don’t experience NaN
s in Tensorflow
.