- Tanh may make for a better activation layer than Sigmoid for intermediate layers.
- Conv1d or a TransformerEncoder may provide better results, as games further away in time may have less impact on the outcome. Structure the data so that input dims are something like [ batch_size, num_game_season, (win/tie/loss, score ratio)]
- You could encode the results of past games with Win = 1.0, Tie = 0.5, Loss = 0.0 for inputs and probability distribution for outputs.
- Dropout on the intermediate layers may help. TransformerEncoder can be set with the dropout argument.
- Simply using a score ratio of loser/winner scores could be added as a second channel, or 0.5 for tie(that will prevent divide by zero in the case of 0 / 0).
1 Like