Small gradient problems

Hi all, I am having trouble with my model not training. I have printed out the median magnitude of the gradient for each parameter in the network, and they are usually 0 and if not very small (of the order of 10e-6).

for epoch in range(num_epochs):
                    for name, param in net.named_parameters():
                        print(name, torch.median(torch.abs(param.grad)).data[0] if param.grad is not None else None)
                    running_loss = 0.0
                    for i, data_batch in enumerate(trainloader, 0):
                        inputs, labels = data_batch
                        labels = to_one_hot(labels, num_classes)
                        labels_clone = labels.clone()
                        inputs, labels = inputs.type(dtype), labels.type(dtype)
                        inputs, labels = Variable(inputs), Variable(labels)
                        optimizer.zero_grad()
                        outputs = net(inputs)
                        outputs_clone = outputs.clone().data
                        metric_value = metric.update(outputs_clone, labels_clone)
                        loss = criterion(outputs, labels, metric_value)
                        running_loss += loss.data[0]
                        loss.backward()
                        optimizer.step()

I am training the all convolutional network (https://github.com/StefOe/all-conv-pytorch) on CIFAR-100, with Pytorch version 0.3.1.

Here is an example of the gradients that I get:

124.51171112060547
q1  	 epoch: 2  	 train_acc =  0.010145833333333333  	 val acc =  0.009166666666666667  	 loss =  186765.4358444214
conv1.weight 5.883157427888364e-07
conv1.bias 4.06868912250502e-06
conv2.weight 1.669390883307642e-07
conv2.bias 1.2699252692982554e-05
conv3.weight 2.7531811497283343e-07
conv3.bias 5.92384094488807e-06
conv4.weight 3.003013944180566e-07
conv4.bias 2.1904938876105007e-06
conv5.weight 3.168307500800438e-07
conv5.bias 1.6316347455358482e-06
conv6.weight 3.1667983080296835e-07
conv6.bias 1.25941642181715e-06
conv7.weight 3.2192451726587024e-07
conv7.bias 1.1171061942150118e-06
conv8.weight 3.980393898928014e-07
conv8.bias 1.3634778497362277e-06
class_conv.weight 6.011470077282866e-08
class_conv.bias 5.082645202492131e-07

Solved: Using softmax with a large number of layers was leading to overflow. Added a batch normalisation layer before the softmax.

1 Like