Reverse Vanishing Gradient - CNN

Vishak_Raj · August 25, 2022, 11:14am

Hello,

In my classification project, I followed to check the gradient flow with help of answers from Check gradient flow in network - #7 by RoshanRane

the network structure is -

CNN layers - [c1-c7]
batch-normalization layer [ b1-b7]
the relu activation function is in between batch-normalization and cnn layers
for analysing the gradient flow, I plotted this layers only,

c1
b1
c2
b2
c3
b3
c4
b4
c5
b5
c6
b6
c7
b7

then one output_layer[linear layer] which is not in the gradient flow graph

Screenshot from 2022-08-25 14-56-28

there is no much update in average gradient in the cnn layers [ c5-c7 ] in all the last epochs

and this is the loss graph for 50 epoch

Screenshot from 2022-08-25 16-09-28

If the gradient is very small or near to 0 then it means vanishing gradient,

but the vanishing gradient actual definition - The gradient may be so small by the time it reaches layers close to the input of the model that it may have very little effect

In this situation, the gradients are almost zero in the layers[layers near to output layer], and gradient changes in the layers which are near to input layer.

how is it possible and how to solve the vanishing gradient in the layers near to output layer,

thanks