How many classes does a binary classifier needs?

I have 5 classes to predict. When I try to predict all of them using one model it doesn’t work that well so I decided to stack 5 binary classification models.

Say I want predict class 1 and I have a folder with all class 1 images and another folder with classes 0,2,3,4.

After training, the output of my model looks like:
class 0: 51.39% | class 1: 81.57%

My question is the following, does the ability of the model to predict class 0 (composed of 0234) is important here?

I believe it would be beneficial for the model to see other classes but I can’t wrap my head around why it would be important to have 100% accuracy for both classes.

It depends a bit on your use case. Are you planning on using multiple stages?

stage0 -> class0 vs. rest
stage1 -> class1 vs. rest
stage2 -> class2 vs. rest

If so, than the “rest” class prediction would matter, since the following stages won’t be able to see this sample anymore as it was already wrongly classified.
Could you also explain a bit, what the current outputs (51% and 81%) mean?

Well yes you’re exactly right. I’m planning to implement a one vs all method. I was under the impression it would not matter because if the model that classifies class 1 vs rest does not detect the image as class 1 it would pass it to the next model.

The 51% and 81% outputs means the following: the model was able to detect class 1 as class 1 81% of the time and class 0 (composed of the rest ie 0234) 51% of the time.

The next model could only classify it as “not class 2”, which is correct for this stage, but of course you’ll never correctly classify this image as class 1.