Hello. There are many examples of neural networks for MNIST hand-written digits classification problem, where the output is a 10-element softmax-vector with one maximum value corresponding to the prediction. This is the case where a label for a particular data-sample is just a number (one-element label, that can take values from 0 to 9). What would the last layer be and a what would the loss function be if I want to be able to predict a multi-element label, where the label for a particular data-sample consists of many elements (say, 128)? In other words, how to predict a function?

For the sake of a dummy example, consider the following problem. We are given an arbitrary greyscale image I(x,y) as an input. We want to integrate the image along the horizontal axis (x-axis) to get the result as a series of numbers f(y)=∑I(x,y). We want our neural network to output f(y).