How to implement the Reinforcement Learning Agent to act at different time steps using Pytorch

uBoscuBo · July 6, 2021, 4:43pm

I was wondering how would you change how often the agent interacts with the environment. Looking at previous custom environments and gym environment it looks like the agent interacts with the environment per time-step. For my purposes I want my agent to interact with the environment once per episode.

It looks like TensorFlow has a build-in function to accommodate how frequently the agent interacts with the environment. Please see the following code as an example.

env = suite_gym.load('CartPole-v0')
tf_env = tf_py_environment.TFPyEnvironment(env)

time_step = tf_env.reset()
rewards = []
steps = []
num_episodes = 5

for _ in range(num_episodes):
  episode_reward = 0
  episode_steps = 0
  while not time_step.is_last():
    action = tf.random.uniform([1], 0, 2, dtype=tf.int32)
    time_step = tf_env.step(action)
    episode_steps += 1
    episode_reward += time_step.reward.numpy()
  rewards.append(episode_reward)
  steps.append(episode_steps)
  time_step = tf_env.reset()

num_steps = np.sum(steps)
avg_length = np.mean(steps)
avg_reward = np.mean(rewards)

print('num_episodes:', num_episodes, 'num_steps:', num_steps)
print('avg_length', avg_length, 'avg_reward:', avg_reward)

So I was wondering if PyTorch has a similar functionality? If not, I guess I would need to redo my custom environment to accommodate different time steps?