Performance Differences in TD3 Training When Switching from NumPy 1.26.0 to NumPy 2.2.6

Hi everyone,

I’m currently training a TD3 agent using Stable-Baselines3 and I’ve run into a surprising issue related to NumPy versions. After a lot of debugging, I’ve noticed that my training results differ significantly depending on whether I use NumPy 1.26.0 or NumPy 2.2.6.

To be clear:

  • I am 100% sure the difference comes from the NumPy version.

  • I have checked all seeding procedures (environment seeds, PyTorch seeds, NumPy seeds, SB3 seeds, Python’s random, etc.).

  • The only changed variable between the two runs is the NumPy version.

With NumPy 1.26.0, training is stable.
With NumPy 2.2.6, performance becomes a little bit worse, even with identical seeds and hyperparameters.


My questions

  • Has anyone already experienced similar behavior when moving from NumPy 1.x to NumPy 2.x while using PyTorch or Stable-Baselines3?

  • Is there any known change in NumPy 2.x (e.g., RNG, broadcasting semantics, linear algebra routines, memory layout, float behavior, etc.) that could explain such a discrepancy?

  • Are there recommended workarounds, or is it currently safer to remain on the 1.26.x branch for reproducible RL experiments?

Additional details

  • TD3 implementation: Stable-Baselines3

  • PyTorch version: 2.9.1

  • Python version: 3.13.8

  • NumPy versions tested: 1.26.0 vs 2.2.6