Hey!
I’m wrapping a simple MultiArmedBandit Gymnasium env I created.
The original gymnasium env always returns the state 0, and its observation space is set to spaces.Discrete(1)
.
However, when wrapping it with GymWrapper
, the observations always give 1 instead of zero.
It seems the issue has something to do with TorchRL’s usage of OneHotDiscreteTensorSpec
, which changes the state to 1 when calling spec.encode
in TorchRL’s internals.
How can it be fixed?
Much appreciated.