I am training CIFAR 10 dataset and I need some clarification regarding the following observation.
As I am performing back propagation in successive epochs the magnitude of gradient of loss function wrt the parameters deeper in the layer is becoming negligibly small.
Lets say if L is my loss and a is my parameter (of the fully connected layer), then in successive epochs the a.grad.norm() looks like this
epoch 1 it is 0.2718
epoch 2 , 0.0619
epoch 3 0.0071
epoch 4 0.0003
epoch 5 7.0541e-06
epoch 6 1.6031e-05
Does this mean I am facing gradient vanishing problem?