On this page (deeplearning4j.org) it says:
A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate one of the problems mentioned above.
How can I find the std of my activations using PyTorch?
I can find the gradients like this, and they look a bit small so I want to investigate this further, or do they really look small?
for i in model.named_parameters():
print("Layer name: ", str(i))
print("Grad Max Value: ", np.amax(i.grad.numpy()))
print("Grad Min Value: ", np.amin(i.grad.numpy()))
Grad Max Value: 0.002665516
Grad Min Value: -0.0016368543
Grad Max Value: 0.0009877739
Grad Min Value: -0.0004688645
Grad Max Value: 1.4279677e-06
Grad Min Value: -0.0024869342
You could use forward hooks to print the std deviation of the layer outputs:
self.conv1 = nn.Conv2d(3, 6, 3, 1, 1)
self.pool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(6, 1, 3, 1, 1)
self.pool2 = nn.MaxPool2d(2)
self.fc = nn.Linear(6*6, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool1(x)
x = F.relu(self.conv2(x))
x = self.pool2(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
model = MyModel()
model.conv1.register_forward_hook(lambda m, x, out: print(out.std()))
model.conv2.register_forward_hook(lambda m, x, out: print(out.std()))
model.fc.register_forward_hook(lambda m, x, out: print(out.std()))
x = torch.randn(1, 3, 24, 24)
output = model(x)
Cool, thanks! Just what I was looking for.
Any comments on my gradients? Or is this a case-to-case thing where there are no right or wrong gradients?
I’m glad it’s working.
Stanford’s CS231n states that the ratio of weights:updates should be roughly
I’m not sure, it that’s still a valid assumption today, especially with some tricks and hacks to accelerate the training.