Glass Degradation Site Prediction

We are working on the subject above, where n glass image frames form an example with an associate target information (x,y) where x and y a Real numbers, which determine the next breaking point.
Thus, a number of l examples are given in the training set, and f examples are given in the test set.
We would like to know if there exists an algorithm (LSTM, Trasformer, etc…), that could be trained and that it is able to predict glasses breaking 2D coordinates.
We were thinking about training 2 independent LSTMs, one for each coordinate, but we are not sure that this will work.