Uniform initialization of complete resnet network

ML0401 · July 21, 2020, 3:28pm

Hello folks,

I am a little bit confused about the weight initializations. As I am training my network (resnet18) from scratch, I like to initialize the weights by an uniform distribution, with the following function:

def init_weights(m):
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')
    elif isinstance(m, nn.Linear):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')
    elif isinstance(m, nn.BatchNorm2d):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')
    elif isinstance(m, nn.MaxPool2d):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')
    elif isinstance(m, nn.AdaptiveAvgPool2d):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')

Do I need to initialize every layer like this or is there an easier way? And is it necessary to zero the bias? When I do it with the function above, I get a VaulueError. Am I missing something?

ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions

EDIT: Obviously the BatchNorm2d layers have problems and can not be computed with this initialization. Do I have to do another initialization for these kind of layers? What about the pooling layers, do I need to initialize them at all?

ptrblck · July 24, 2020, 9:31am

Some layers such as batchnorms and the bias parameter of other layers cannot be used in init methods, which require a multi-dimensional parameter.

The weight of nn.BatchNorm2d is by default now initialized with ones, while the bias with zeros, which seems to be the recommended way now.

Pooling layers don’t have any parameters, so you don’t need to initialize them.