I believe that you’re asking why calls to Categorical.sample() (and
thing computed from them) aren’t reproducible.
This is to be expected as .sample() returns pseudorandom samples
from the Categorical distribution you instantiated. These samples
are supposed to be (randomly) different from one another. If you want
a series of .sample()s to be reproducible, you may reset the underlying
random-number generator with torch.manual_seed(). Consider:
>>> import torch
>>> torch.__version__
'2.2.2'
>>> _ = torch.manual_seed (2024)
>>> m = torch.distributions.Categorical(torch.tensor([0.25, 0.25, 0.25, 0.25]))
>>> m.sample(), m.sample() # two samples
(tensor(2), tensor(1))
>>> m.sample(), m.sample() # two more samples -- different result
(tensor(3), tensor(2))
>>> _ = torch.manual_seed (2024) # reset random number generator
>>> m.sample(), m.sample() # same two samples as at the beginning
(tensor(2), tensor(1))
choose_action() is a member funcion of PPO class. I delare a ppo object, and call its choose_action() in evaluate() function
if call .sample() before evaluate() many times, ppo give a better reward
I know that .sample() is not repoducible, but the problem is that .sample() may have memory, call .sample() many times before call choose_action(), will give me a more better reward.
This is just* a (pseudo) statistical happenstance cause by the different,
but statistically-equivalent, (pseudo) random samples you get after
“advancing the random-number-generator state” by calling .sample()
a bunch of times (and discarding the results) before performing your
actual computation.
.sample() (or, more precisely, an instance of Categorical) does not
have memory. (Pytorch’s random-number generator does have state.
That state changes the specific values of the samples generated, but
not their statistical characteristics.)
Try this experiment, say ten to one hundred times. Start a new python
session and import pytorch. This initializes pytorch’s random-number
generator to a new, “random,” state. Either compute your reward or
call .sample() a bunch of times and then compute your reward. You
will find that there is no statistical difference between the rewards you
get with and without calling .sample() a bunch of times.
*) There is some lore that the quality of the pseudorandom numbers
initially produced by pytorch’s random-number generator is reduced if seed has only low-order bits. So, if you want to be extra-careful about
this (I don’t bother.), you should use a seed that comes from basically
the whole range of a 64-bit integer.