Hi everyone,
I’m currently training a TD3 agent using Stable-Baselines3 and I’ve run into a surprising issue related to NumPy versions. After a lot of debugging, I’ve noticed that my training results differ significantly depending on whether I use NumPy 1.26.0 or NumPy 2.2.6.
To be clear:
-
I am 100% sure the difference comes from the NumPy version.
-
I have checked all seeding procedures (environment seeds, PyTorch seeds, NumPy seeds, SB3 seeds, Python’s
random, etc.). -
The only changed variable between the two runs is the NumPy version.
With NumPy 1.26.0, training is stable.
With NumPy 2.2.6, performance becomes a little bit worse, even with identical seeds and hyperparameters.
My questions
-
Has anyone already experienced similar behavior when moving from NumPy 1.x to NumPy 2.x while using PyTorch or Stable-Baselines3?
-
Is there any known change in NumPy 2.x (e.g., RNG, broadcasting semantics, linear algebra routines, memory layout, float behavior, etc.) that could explain such a discrepancy?
-
Are there recommended workarounds, or is it currently safer to remain on the 1.26.x branch for reproducible RL experiments?
Additional details
-
TD3 implementation: Stable-Baselines3
-
PyTorch version: 2.9.1
-
Python version: 3.13.8
-
NumPy versions tested: 1.26.0 vs 2.2.6