PNSR metric for Image Super Resolution


I am doing some research in Image Super Resolution, and currently I am implementing some papers from the NTIRE2017 challenge (with the DIV2K dataset). I trained the models ok, but I dont quite understand a thing about the metrics, in particular, the Peak-Noise-Signal-Ratio, or PNSR.
The PNSR is calculated for each image as 10 log10(1 / mse²), where mse is the mean squared error between both high-resolution and super-resolution prediction. However, this is for one image only. The way I aggregate the results for all the images in the validation dataset is where things are interesting.

  1. The first way I think doing this was by: 1) calculating the MSE for all the images, 2) average the MSE for the entire dataset; 3) apply the PNSR formula with this MSE.
  2. The second way I applied was to average the PNSR for all the images.

I think the correct way to do this was the 1st way, because I dont want to average log values. However, I found the model only got the correct score with the second version (31db vs 35db PNSR).

Saying that, what do you think about this?