Hey guys,

I am training CIFAR 10 dataset and I need some clarification regarding the following observation.

As I am performing back propagation in successive epochs the magnitude of gradient of loss function wrt the parameters deeper in the layer is becoming negligibly small.

Lets say if L is my loss and a is my parameter (of the fully connected layer), then in successive epochs the a.grad.norm() looks like this

epoch 1 it is 0.2718

epoch 2 , 0.0619

epoch 3 0.0071

epoch 4 0.0003

epoch 5 7.0541e-06

epoch 6 1.6031e-05

Does this mean I am facing gradient vanishing problem?