Default Weight Initialization vs my kaiming_uniform init

I read some topics, and know that most of the layers are initialized by kaiming_uniform_initialization, including Conv2D, Linear layers…And when i use these default initialization, my VGG runs well.

I try to reset these layers’ initialization with my own initialize_weights function, in my function, i use nn.init.kaiming_uniform_ as well. But when i train my VGG with same hyperparameters, loss fly to NAN.

Is there any wrong in my codes?Thanks for your help!!

def initialize_weights(layer):
    if isinstance(layer, nn.Conv2d):
        nn.init.kaiming_uniform_(layer.weight, mode='fan_in', nonlinearity='relu')
        if layer.bias is not None:
            nn.init.constant_(layer.bias, 0)

model.apply(initialize_weights)

I find that the gain value in the default initialization is sqrt(5).
(means a=sqrt(5) in the code below)

torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

but based on the paper, if my activation function are all ReLU, i should set gain value equal to 0.
So i set a=0 in my code, but got NAN loss. What’s the problem?