I am training a QA model, which is not giving satisfactory results. The metrics are subpar. Upon investigation, I tried plotting the layer gradients and found this. I have used the function from this thread.
From the figure above, it seems as if the middle layers are not learning anything.
What should I try doing in order to fix this?