If I have some conditions and assumptions described below:

Dataset (training set & testing set) are both color images

The input of VAE is [batch_size, 3, 256, 256]
VAE has been trained, including an encoder and decoder
The output of the encoder is mu and the log_var, dimension is [batch_size, 256]
The input of the decoder is [batch_size, 3, 256, 256]

The data x to be tested today is [batch_size, 3, 256, 256]

I want to use the algorithm 4. in the VAE paper “Variational Autoencoder based Anomaly Detection
using Reconstruction Probability” to do the anomaly detection, but I don’t know how to define the reconstruction probability function and how can I get mu and log_var via decoder??? (decoder output is a color image)

Note that in the paper the output of the decoder is not just a “point estimator” as in your description (if that is the output of the decoder rather than the input) but a distribution p(x | z) which is assumed to be Gaussian (when the inputs are continuous).

So you should have (for example)

encoder input [batch_size, 3, 256, 256]

encoder output mu, log_var as [batch_size, 256]

decoder input [batch_size, 256]

decoder output mean, log_variance of [batch_size, 3, 256, 256]

The reconstruction probability is defined as the likelihood of the input given by the encoder-decoder. As that requires integrating the latents, the latents are sampled from the mu and log_var output by the encoder. Then the likelihood of the input given the decoder with each given latent input is computed and then averaged over the samples.

If you cannot modify your setup to give mean and var as decoder outputs, you might assume a fixed log_var (maybe estimated from “known good” data) and use the output of your decoder as means.