I noticed that sometimes at high learning rate, my model produces NaN randomly in the test output:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
That is to say, I am able to train it properly without error, but randomly during evaluating on the test set, the prediction from the model contains NaN.
Is this known to happen, and if so, why?
Tou would have to specify what kind of model you are using. To answer as there could be some other cause.
However, from what you are saying it does seem like the learning rate is responsible for this. The reason this occurs is because of the way gradient descent works.
image from here:
Essentially your larger learning rate is causing you to overshoot and you are artificially causing something similar to the exploding gradient problem (which you can read more about here: https://machinelearningmastery.com/exploding-gradients-in-neural-networks/)
In exploding gradient problem errors accumulate as a result of having a deep network and result in large updates which in turn produce infinite values or NaN’s. In your case your large updates are directly a result of having a large learning rate forcing a large update which causes your NaNs. Though if you have a large network it could be a result of both things combined.