Hi
I am using Alexnet for my project. and I modified the official code to:
class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
)
self.fc_cls = nn.ModuleList()
for i in range(10):
self.fc_cls.append(nn.Linear(4096, 2))
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
out_cls = [None] * 10
for i in range(10):
out_cls[i] = self.fc_cls[i](x)
return out_cls
since I got 10 binary outputs. This code works fine. But when I add some weight initialisation to it, i.e. add the following code
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
to the __init__
function, the model got explode. Here is the output of the model after a few iteration in the first epoch
Epoch:[1][0/2543] Time:2.653 (2.653) Loss:27.8872 (27.8872) Avg:44.92 (44.92)
Epoch:[1][10/2543] Time:1.047 (0.758) Loss:78.1975 (67.7367) Avg:51.52 (70.67)
Epoch:[1][20/2543] Time:1.841 (0.758) Loss:nan (nan) Avg:16.45 (50.17)
Epoch:[1][30/2543] Time:0.542 (0.684) Loss:nan (nan) Avg:10.43 (38.51)
Epoch:[1][40/2543] Time:0.829 (0.642) Loss:nan (nan) Avg:13.55 (32.32)
Epoch:[1][50/2543] Time:0.632 (0.619) Loss:nan (nan) Avg:14.18 (28.57)
Can someone please tell me why this is happening? Thanks.