This might be somewhat splitting hair, but I am wondering about this:
When you calculate the dataset mean and std, do you compute it over just the training data or do you compute it over the entire dataset (train + test)?
What are the underlying mathematical considerations when making this decision?
Thank you!