Why does F.mse_loss behave differently w.r.t. Tensor and Parameter?

Below is my code:

import torch as pt
from torch.nn import functional as F

a = pt.Tensor([[0, 1], [2, 3]])
b = pt.Tensor([[1, 0], [5, 4]])
print(F.mse_loss(a, b), F.mse_loss(a, b, reduction='elementwise_mean'))

a = pt.nn.Parameter(a)
b = pt.nn.Parameter(b)
print(F.mse_loss(a, b), F.mse_loss(a, b, reduction='elementwise_mean'))

The output was:

tensor(3.) tensor(3.)
tensor(12., grad_fn=<SumBackward0>) tensor(12., grad_fn=<SumBackward0>)

I wonder why they gave two different results?

Environment setting:
python 3.6
pytorch 0.4.1

This is a bug. Sorry about it. It was previous reported at https://github.com/pytorch/pytorch/issues/10009 and we already issued a fix on master.