I’m training a simple model on MNIST data where my goal is to obtain the worst possible test accuracy.

Intuitively, given my model reaches to %99 accuracy when it’s trained properly, the worst possible accuracy can be at most %1. I can simply obtain this accuracy by picking some random value that is different than my model’s prediction.

However, if I want to train the model to do this instead of messing with its inference phase, how can I do it? I naively attempted to negate the output of my loss function (negative log-likelihood) with a goal of maximizing the training loss. However, I ended up with a model that reaches %10 accuracy which always predicts the same digit.

So long story short, what’s the proper way to to train a model where the goal is to obtain the worst possible test accuracy? Should I pick a loss function that is bounded (both from above and below) and try to negate that instead? If yes, what’s a suitable candidate?