This is related to the question I asked a short while ago, I thought it’s to much in one question.
some code piece, the whole script is at the end. It works with a Bernoulli distribution. But when I tried with Categorical, changing network outputs to two and others, it doesn’t learn. Bernoulli still works, perhaps even better when I use the second item of the changed network as probability. It’s confusing to me.
…
model = nn.Sequential(
nn.Linear(env.observation_space.shape[0], 24),
nn.ReLU(),
nn.Linear(24, 36),
nn.ReLU(),
nn.Linear(36, 1),
nn.Sigmoid(),
)
…
for step in range(1000):
state = torch.from_numpy(next_state).float()probs = model(state) dist = Bernoulli(probs) action = dist.sample() next_state, reward, done, __ = env.step(int(action.item())) recorder.record(state, action, reward) if done: break
The whole script is here:
https://gist.github.com/mtian2018/5dc5e69dda5666c4655676bac4dad996