Hi, I am trying to initialize the weights of a conv net (with nn.Sequential) using a custom method.
When I do this initialization my network achieves an accuracy equal to ~10% (for CIFAR-10 this is equivalent to a random response => the network doesn’t learn anything). Without this initialization I get ~58% accuracy (the conv net can learn without this init).
I am sure that I am doing something wrong but I don’t know where is the problem. I would like to initialize the weights using the
zero_weight methods. Any help/advice is welcome . Thanks.
The code (some code is from the cs231 course from Stanford):
def random_weight(shape): """ Kaiming normalization: sqrt(2 / fan_in) """ if len(shape) == 2: # FC weight fan_in = shape else: fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW] w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in) w.requires_grad = True return w def zero_weight(shape): return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True) def weights_init(m): if type(m) in [nn.Conv2d, nn.Linear]: m.weight.data = random_weight(m.weight.data.size()) m.bias.data = zero_weight(m.bias.data.size()) class Flatten(nn.Module): def forward(self, x): return flatten(x) model = nn.Sequential( nn.Conv2d(in_channel, channel_1, (5, 5), padding=2), nn.ReLU(), nn.Conv2d(channel_1, channel_2, (3, 3), padding=1), nn.ReLU(), Flatten(), nn.Linear(channel_2 * 32 * 32, num_classes) ) model.apply(weights_init) optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, nesterov=True)