Hello,
In order to improve the sampling efficiency I decide to use multiprocessing to do the sampling with CPU(model prediction also with cpu) and train the model with single a process by GPU.
The multiple rollout workers code as followed:
def multi_sampling(_env):
# _env.actor's and _env.critic's named_parameters() became all 0 **after training**.
# The parameters are correct before training
_env.env.reset()
agent = Agent(_env)
sample_res = agent.sampling()
return sample_res
env = CartPole() # _env has my actor model and critic model
process_pool = Pool(env.n_WORKER)
for sample_index in range(env.n_SAMPLES_PER_EPISODE):
res = process_pool.apply_async(multi_sampling, args=(env,))
res_dict[sample_index] = res
After training the models(update the weights of actor and critic), the model’s weights and biases becomes 0.0 in function multi_sampling. But the weights and biases are correct before my first training.
If I only use CPU to do the sampling and training, the results will just perfect which I could finish the cartpole task in 10 seconds (avg 195 rewards of continuous 100 episodes)
My question is, how do I sampling with CPU and training with GPU in a efficient and correct way?