Concatenating observations that include image, pose and sensor readings

@Omegastick Do you have a link to a paper that does what you have suggested?

It shouldn’t be the case that you’re using a CNN for the image and an MLP for the pose and sensor values, just to make the inputs compatible to an RL algorithm’s function approximator.