It’s a very simple problem, but it’s a bit confusing.

Below is a typical MSE loss function that I am aware of.

1/n(∑ (y_i - y’_i)^2)

*∑ → i=1 to n

What does n mean here is the size of a mini-batch?

One more question is, shouldn’t the general MSE formula be marked as follows in order to express the loss in the neural network with more than 2 outputs?

1/(m*k)(∑ ∑ (y_i,j - y’_i,j)^2)

*First ∑ → i=1 to m, second ∑ → j=1 to k,

m = mini-batch size

k = The number of output neurons