this is how my model’s gradient average per layer looks like during training.
according to this graph, can vanishing gradient be a problem?
this is how my model’s gradient average per layer looks like during training.
according to this graph, can vanishing gradient be a problem?