Discussion on combining sub-state into one state for training rl model


Would like to ask the community, if anyone has suggestion on representing a single state from multiple sub-states for rl training.

For example, I have a substate that is represented by a 2d tensor (width by height). But then i have another substate represented as a vector of n inputs.

With these two sub-states, I would like to combine them to represent 1 input state to my neural architecture.

I’m not sure if it would be valid to perhaps convert the 2d tensor substate into a 1d tensor and append the other vector input to it?

Appreciate any suggestions or advice, tku!

Take an image and a vector as example, you cannot combine both tensors directly; instead you may treat them as different modes, and have a multi-head network talking them as inputs:

  • im_head: have a conv module with the image as input
  • vec_head: have a plain mlp module with the vector as input

Now you can explore a few ways of combining them:

  1. flatten the im_head output, then concat it with the vec_head output, or
  2. flatten the im_head output, add a linear so that it comes out the same shape as vec_head, then do element-wise addition
  3. more generally, taking from 2, you can do FiLM: https://distill.pub/2018/feature-wise-transformations/

Once combined, you can further pass through more MLPs then to whatever output architecture you need.