Output of actor is (almost)same for different states

You’re using batch norm so it is expected that after initialization the output is roughly the same.
If you use batch norm and apply an affine transform to your image, the result will be the same.
Aside, using batch norm in RL is not always a great idea as you may loose some “temporal” information about your data. It is not a common thing to do.
Check our ConvNet in torchrl for a quick implementation