I built pytorch 1.5 with CUDA9.0 myself. Recently I encountered NaN/inf values when doing network forward. But the same code works fine on other servers(pytorch1.5+CUDA10.2).
Is it possible for pytorch>1.4 built with CUDA9.0 to have anomaly behavior?
- The random NaN was introduced by 1 of 64 weights in first conv layer suddenly became NaN.