Hi,
I’ve been thinking for a while to switch from openai baselines PER implementation to torchrl.data.PrioritizedReplayBuffer, since it should be faster, considering current implementation needs to load batch to torch format anyway.
In the original paper, the parameters alpha and beta were changing over time. I also am aware that in some cases, fixed alpha and beta may be better (as reported in dopamine/rainbow).
Since I don’t see an official way of scheduling alpha and beta in documentation, I’m curious whether that was done purposefully for simplicity with believe that constant alpha and beta are better or maybe it has to do with something else or is actually possible?