I´m applying Deep Reinforcement Learning for the first time, and I have some questions about it (I´ve already looked for an answer but in vain):
- How to normalize objectives’ values in the reward function? if we have an objective that values are in the range of 10 and another objective that values are in the range of 1000.
- During the training phase, how can we watch the weights updates of a network and the gradient calculation too?
- In a multi-agent setting and episodic task, for “dones” vector, it will be set to “True” once all the agents are finished, or once an agent finishes the task done[agent_index]=True in other words, we won´t wait the latest agent to finish to set dones = [True]*number_of_agents