Concatenating observations that include image, pose and sensor readings

While this approach might generally work, I had some trouble concatenating the outputs of a pre-trained CNN and a Fully-Connected model in the past maybe due to different output value stats. It seemed the whole model just ignored the FC part and just used the CNN outputs.
After carefully rescaling the outputs it was working, so you might also want to consider this.
Let me know, how it works out. :wink: