This is related to the question I asked a short while ago, I thought it’s to much in one question.
some code piece, the whole script is at the end. It works with a Bernoulli distribution. But when I tried with Categorical, changing network outputs to two and others, it doesn’t learn. Bernoulli still works, perhaps even better when I use the second item of the changed network as probability. It’s confusing to me.
model = nn.Sequential(
for step in range(1000):
state = torch.from_numpy(next_state).float()
probs = model(state) dist = Bernoulli(probs) action = dist.sample() next_state, reward, done, __ = env.step(int(action.item())) recorder.record(state, action, reward) if done: break
The whole script is here: