Are there any kinds of literature that talked about deploying a trained agent? I been looking, but I couldn’t find any. I want to know how to deploy a trained agent. In a particular case using the DDPG algorithm, do I just call the Actor with the same parameters as my trained Actor and load the information? Do I need to include the learning algorithm from DDPG? Is the reward still needed, or can I neglect that?
import torch
from matplotlib.animation import FuncAnimation
from matplotlib.widgets import TextBox
from tclab import clock
import numpy as npactor = ActorNetwork(alpha=0.0001, input_dims=(2,), fc1_dims=400,
fc2_dims=300, n_actions=2, name=‘actor’)
actor.load_checkpoint()
env = PIEnv()facecolor = ‘lightgoldenrodyellow’
fig, axs = plt.subplots(2, figsize=(12, 8), tight_layout=True)SP_input = env.states[1]
SP_mem =
PV_mem =
OP_mem =
time_mem =def animate(i, PV_mem, SP_mem, OP_mem, time_mem):
states = env.reset()
action = None
with torch.no_grad():
state = torch.tensor([states], dtype=torch.float)
if i % 180 == 0 or action is None:
action = actor.forward(state)
states_, PV, SP, OP, Ks = env.step(action.detach().numpy()[0], 50)
states = states_
print(f’Time: {int(i)}, PV: {PV:.2f}, SP: {SP:.0f}, OP: {OP:.2f}')SP_mem.append(SP) PV_mem.append(PV) OP_mem.append(OP) time_mem.append(i) plt.cla() axs[0].plot(time_mem, SP_mem, 'r--', label='Setpoint') axs[0].plot(time_mem, PV_mem, 'b-', label='Process Variable') axs[0].set_ylabel('Temperature °C') axs[1].plot(time_mem, OP_mem, 'g-', label='Heater Output') axs[1].set_ylabel('Heater Output') axs[1].set_xlabel('Time (seconds)') axs[0].set_title(f'IAE: {states_[0]:.0f}', loc='left') axs[0].set_title(f'Kp: {Ks[0]:.2f}, Ki: {Ks[1]:.4f}', loc='right')
try:
ani = FuncAnimation(fig, animate, fargs=(PV_mem, SP_mem, OP_mem, time_mem), interval=1000)
plt.show()
except KeyboardInterrupt:
env.close()
Thanks is what I have and I don’t know if it captures all the information I need to deploy my RL model. Any advice would be much appreciated. Thanks!