My multi-dimensional Transformer doesn't seem to learn anything

Hi, i’m facing similar issue. Mine gives me equal probability for every output.

1 Like