Classification of moving pixels

I have a data set with videos of moving pixels. Each video contains 32 frames, each frame is 32x32 with two pixels in white and the rest in black. I have binary labels for 800 of these obtained by human labeling. I now want to construct a classifier in order to generalize from this set. I have access to the algorithm generating the pixel movements. The pixels move 1 pixel in any direction each frame, but they never overlap.

My understanding is that 3DCNNs is a good match for this type of problem, but I have mainly seen this used for “dense” videos, my data is very sparse in comparison. I have also seen suggestions of using LSTMs or dual stream solutions.

What is the best or more optimal approach given my type of data?

I would very much appreciate any suggestions, thank you.