I am well known with that a “normal” neural network should use normalized input data so one variable does not have a bigger influence on the weights in the NN than others.

But what if you have a Qnetwork where your training data and test data can differ a lot and can change over time in a continous problem?

My idea was to just run a normal run without normalization of input data and then see the variance and mean from the input datas of the run and then use the variance and mean to normalize my input data of my next run.
But what is the standard to do in this case?

normalizer.observe(new_state)
new_state = normalizer.normalize(new_state)
'''
new_state must be a simple tensor,
if it's a variable, use new_state.data
'''

Ah I see, here there is a weird thing since the input of observe is a variable while the input of normalize is a tensor. Let me correct it so everything must be simple tensor (input and output).

Do we agree that this kind of “online” normalization is not injective ? In the sense that two distinct inputs that are observed and normalized at different time may be mapped to the same output value.

Furthermore, this mapping / filtering / normalization is not guaranteed to be monotonic (especially in the beginning when very few data have been observed).

A bit of a late reply, but:
this kind of normalization made me struggle for quite some time. Using it, my DQN applied on cartpole could not learn properly. So you definitely need to be very careful.
I will from now on either not normalize at all or freeze the update of the normalization parameters after some initial steps.