Would like to ask the community, if anyone has suggestion on representing a single state from multiple sub-states for rl training.
For example, I have a substate that is represented by a 2d tensor (width by height). But then i have another substate represented as a vector of n inputs.
With these two sub-states, I would like to combine them to represent 1 input state to my neural architecture.
I’m not sure if it would be valid to perhaps convert the 2d tensor substate into a 1d tensor and append the other vector input to it?
Appreciate any suggestions or advice, tku!
Take an image and a vector as example, you cannot combine both tensors directly; instead you may treat them as different modes, and have a multi-head network talking them as inputs:
- im_head: have a conv module with the image as input
- vec_head: have a plain mlp module with the vector as input
Now you can explore a few ways of combining them:
- flatten the im_head output, then concat it with the vec_head output, or
- flatten the im_head output, add a linear so that it comes out the same shape as vec_head, then do element-wise addition
- more generally, taking from 2, you can do FiLM: https://distill.pub/2018/feature-wise-transformations/
Once combined, you can further pass through more MLPs then to whatever output architecture you need.