Performance Differences in TD3 Training When Switching from NumPy 1.26.0 to NumPy 2.2.6

chouakifares · December 9, 2025, 8:16am

Hi everyone,

I’m currently training a TD3 agent using Stable-Baselines3 and I’ve run into a surprising issue related to NumPy versions. After a lot of debugging, I’ve noticed that my training results differ significantly depending on whether I use NumPy 1.26.0 or NumPy 2.2.6.

To be clear:

I am 100% sure the difference comes from the NumPy version.
I have checked all seeding procedures (environment seeds, PyTorch seeds, NumPy seeds, SB3 seeds, Python’s random, etc.).
The only changed variable between the two runs is the NumPy version.

With NumPy 1.26.0, training is stable.
With NumPy 2.2.6, performance becomes a little bit worse, even with identical seeds and hyperparameters.

My questions

Has anyone already experienced similar behavior when moving from NumPy 1.x to NumPy 2.x while using PyTorch or Stable-Baselines3?
Is there any known change in NumPy 2.x (e.g., RNG, broadcasting semantics, linear algebra routines, memory layout, float behavior, etc.) that could explain such a discrepancy?
Are there recommended workarounds, or is it currently safer to remain on the 1.26.x branch for reproducible RL experiments?

Additional details

TD3 implementation: Stable-Baselines3
PyTorch version: 2.9.1
Python version: 3.13.8
NumPy versions tested: 1.26.0 vs 2.2.6