Trying to use a Transformer for a simple contrived task doesn't work

For some reason I tried feeding random binary sequences into a small transformer and training the model to predict the value at position k+1 given the transformer output at position k. The sequences are summed with learnt positional embedding, with the aim that the model learns to query the input sequence in the right position.

However this does not work at all and the model does not learn anything.
I tried hunting for bugs with no success.
I tried many hyperparams with no success.
I tried predicting value at position k from outputs at position k and it works easily (residual connection probably)
I tried replacing the Transformer with a feedforward net and it works easily.
Can’f find my bug! or is this a difficult task for a transformer? sounds weird

The code can be found here: TransformerBuffle/ at main · yotam-happy/TransformerBuffle · GitHub
Would greatly appreciate if anyone spots my bug or has some explanation :slight_smile:

Random binaries are i.i.d.s
Hence, you cannot predict one, given other samples.

I’m not sure I understand the use case completely, but were you able to predict the next number in a sequence of random numbers without any signal using a feed-forward model?
If so, I would check for a data leak, since as @mMagmer described, I would expect all models to fail predicting random numbers (unless the PRNG is giving you some information about the sequence).