Which loss function to use? networks turn to whole zero

I have a data with 200 elements that just one or two element is non-zero less than 0.5 that should be predicted based on train data by a network.
I trained tons of different networks but they usually turn to a whole-zero weight network although mse loss equals to 0.0001.
what should I do?