It is not a problem of bz, the weight update is unstable, and the gradient backpropagates, resulting in gradient explosion. But I tried gradient cropping before, and it didn’t work. Is there any other way to solve this problem?
It is not a problem of bz, the weight update is unstable, and the gradient backpropagates, resulting in gradient explosion. But I tried gradient cropping before, and it didn’t work. Is there any other way to solve this problem?