Deploying a Reinforcement Learning Model (DDPG

uBoscuBo · July 9, 2021, 7:24pm

Are there any kinds of literature that talked about deploying a trained agent? I been looking, but I couldn’t find any. I want to know how to deploy a trained agent. In a particular case using the DDPG algorithm, do I just call the Actor with the same parameters as my trained Actor and load the information? Do I need to include the learning algorithm from DDPG? Is the reward still needed, or can I neglect that?

import torch
from matplotlib.animation import FuncAnimation
from matplotlib.widgets import TextBox
from tclab import clock
import numpy as np

actor = ActorNetwork(alpha=0.0001, input_dims=(2,), fc1_dims=400,
fc2_dims=300, n_actions=2, name=‘actor’)
actor.load_checkpoint()
env = PIEnv()

facecolor = ‘lightgoldenrodyellow’
fig, axs = plt.subplots(2, figsize=(12, 8), tight_layout=True)

SP_input = env.states[1]

SP_mem =
PV_mem =
OP_mem =
time_mem =

def animate(i, PV_mem, SP_mem, OP_mem, time_mem):
states = env.reset()
action = None
with torch.no_grad():
state = torch.tensor([states], dtype=torch.float)
if i % 180 == 0 or action is None:
action = actor.forward(state)
states_, PV, SP, OP, Ks = env.step(action.detach().numpy()[0], 50)
states = states_
print(f’Time: {int(i)}, PV: {PV:.2f}, SP: {SP:.0f}, OP: {OP:.2f}')
    SP_mem.append(SP)
    PV_mem.append(PV)
    OP_mem.append(OP)
    time_mem.append(i)

    plt.cla()
    axs[0].plot(time_mem, SP_mem, 'r--', label='Setpoint')
    axs[0].plot(time_mem, PV_mem, 'b-', label='Process Variable')
    axs[0].set_ylabel('Temperature °C')

    axs[1].plot(time_mem, OP_mem, 'g-', label='Heater Output')
    axs[1].set_ylabel('Heater Output')
    axs[1].set_xlabel('Time (seconds)')

    axs[0].set_title(f'IAE: {states_[0]:.0f}', loc='left')
    axs[0].set_title(f'Kp: {Ks[0]:.2f}, Ki: {Ks[1]:.4f}', loc='right')
try:
ani = FuncAnimation(fig, animate, fargs=(PV_mem, SP_mem, OP_mem, time_mem), interval=1000)
plt.show()
except KeyboardInterrupt:
env.close()

Thanks is what I have and I don’t know if it captures all the information I need to deploy my RL model. Any advice would be much appreciated. Thanks!