Turning 1d pixel image into labeled coordinates

Hi,

I’m trying to label the coordinates of 2 objects in 1D space using a neural network.

My input would look like 28 time steps of something like

[0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]

and the output should output the coordinates of both objects(e.g. [ 9. 21.]) for each time step

Could anyone provide some initial points to look at at and what loss I should use to train the network?