Intro: I have a dataset where instances are in the form of time series, but I’m generally interested in solving instancewise (is instance anomaly or not) anomaly detection problems with different types of autoencoders such as plainautoencoder, GRUbased, LSTMbased autoencoders, etc.
Question 1
Which is the best/recommanded cost function for autoencoders on the anomaly detection problem and why?

Binary Cross Entropy Loss (BCELoss)
From documentation target y have to be normalized into [0…1], which is usually done with Sigmod on the last layer of decoder, but I guess… I have to normalize input also into [0…1] before training. That means I have to save the min and max calculated with MinMaxScaler from Sklearn and use it in prediction before applying autoencoder. Is there some more elegant way to do the same normalization with PyTorch and save it with PyTorch model together? 
Mean Squared Error Loss (MSELoss)
Here I believe that I don’t need normalization nor Sigmoid at the end. 
Or … is there some other more appropriate cost function for this problem, related to time series or not?
Question 2
How to implement predict_proba method for this type of autoencoders?
 I understand that I need to set a threshold that will be used in a way to predict 1 (outlier) if the difference between input and predicted values is greater than the threshold and otherwise 0 (inlier), but I’m not sure how to implement this efficiently and how to calculate the probablities.