Categorical vs Bernoulli in solving CartPole

This is related to the question I asked a short while ago, I thought it’s to much in one question.

some code piece, the whole script is at the end. It works with a Bernoulli distribution. But when I tried with Categorical, changing network outputs to two and others, it doesn’t learn. Bernoulli still works, perhaps even better when I use the second item of the changed network as probability. It’s confusing to me.


model = nn.Sequential(
nn.Linear(env.observation_space.shape[0], 24),
nn.ReLU(),
nn.Linear(24, 36),
nn.ReLU(),
nn.Linear(36, 1),
nn.Sigmoid(),
)

for step in range(1000):
state = torch.from_numpy(next_state).float()

        probs = model(state)
        dist = Bernoulli(probs)
        action = dist.sample()

        next_state, reward, done, __ = env.step(int(action.item()))
        recorder.record(state, action, reward)

        if done:
            break

The whole script is here:
https://gist.github.com/mtian2018/5dc5e69dda5666c4655676bac4dad996