Generalisation of Deep RL

Hi,
I am interested in evaluating deep reinforcement models. How can we evaluate a model trained in a Reinforcement learning manner ? can we quantify the DRL generalization ?
I mean if we have a model trained to solve some tasks can we just fix its weights and launch the test to new samples from the same task