Training Alexnet from sckech


I am using Alexnet for my project. and I modified the official code to:

class AlexNet(nn.Module):

    def __init__(self):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.MaxPool2d(kernel_size=3, stride=2),
        self.classifier = nn.Sequential(
            nn.Linear(256 * 6 * 6, 4096),
            nn.Linear(4096, 4096),

        self.fc_cls = nn.ModuleList()
        for i in range(10):
            self.fc_cls.append(nn.Linear(4096, 2))

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)

        out_cls = [None] * 10

        for i in range(10):
            out_cls[i] = self.fc_cls[i](x)
        return out_cls

since I got 10 binary outputs. This code works fine. But when I add some weight initialisation to it, i.e. add the following code

for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
      , math.sqrt(2. / n))

to the __init__ function, the model got explode. Here is the output of the model after a few iteration in the first epoch

Epoch:[1][0/2543]	Time:2.653 (2.653)	Loss:27.8872 (27.8872)	Avg:44.92 (44.92)	
Epoch:[1][10/2543]	Time:1.047 (0.758)	Loss:78.1975 (67.7367)	Avg:51.52 (70.67)	
Epoch:[1][20/2543]	Time:1.841 (0.758)	Loss:nan (nan)	Avg:16.45 (50.17)	
Epoch:[1][30/2543]	Time:0.542 (0.684)	Loss:nan (nan)	Avg:10.43 (38.51)	
Epoch:[1][40/2543]	Time:0.829 (0.642)	Loss:nan (nan)	Avg:13.55 (32.32)	
Epoch:[1][50/2543]	Time:0.632 (0.619)	Loss:nan (nan)	Avg:14.18 (28.57)	

Can someone please tell me why this is happening? Thanks.

What’s the learning rate?

0.01. The same as the learning rate in the Alexnet paper.

Did you try reducing?