I believe I am running into vanishing gradients and was hoping to see if anyone could help me either fix this or revive the model.
I am use a recreation of alphazero’s model but made in pytorch.
This started showing up a while ago in the iterations but I never caught it early enough and am looking to either fix it running forward and if needed to modify some weights now.
I noticed it when I was compiling my model into a tensorrt model via torch_tensorrt. It was giving me warnings that my model has weights smaller than float16 that would get converted to the lowest float16 value.
Upon inspection of my weights I have noticed that about 90% or more of them on my layers are near 0, thus my issue with vanishing weights.
From my understanding of vanishing weights, my model shouldn’t have vanishing weights, but it seem that it does.
- Can I fix this running forward? Anything wrong here?
- Can I just set all of my vanishing weights to a minimum value to jumpstart them? or maybe a range of minimum values? I understand this could temporarily mess up the model but I would hope that after a few iterations it could recover?