The weird case of CNN with multiple binary classifiers in the final layer

Hello PyTorch community!

I’m having this small idea as part of a much bigger project to train a CNN that has multiple binary classifiers at the final layer. It might sound weird but it’s actually inspired by MDNet [1] training procedure for visual tracking. I have a few questions regarding training such a network for multi-class image classification. Here are some details beforehand:

  1. So for a 10 class MNIST dataset, we would have 10 separate binary classifiers (binary softmax) instead of a single 10-way softmax classifier.
  2. binary classifier 1 is responsible to classify ‘1’ images (positive samples) and other samples as non ‘1’ (negative samples)
  3. every layer and parameter before binary classifiers are shared for all batches.

So here are my questions about loading and feeding data:

  1. How would you partition your data to positive and negative samples for training?
    My initial answer to this would be to take ‘p’ positive and ‘n’ negative samples (images) for classifier ‘i’ at each batch. So batch 1 for classifier 1, batch 2 for classifier 2, and so on!
  2. How would you write a routine for PyTorch to do this? my initial answer was to write a complicated routine of choosing ‘p’ number of positive images and ‘n’ number of negative samples at each batch. Bu then my code got very complicated and I thought there might be easier answers for this.
    Another way would be to make 10 copies of the dataset and label each based on the positive and negative samples. So a copy of MNIST with ‘1’ images labeled as one and the rest as zero; another copy of MNIST with ‘2’ images labeled as one and the rest as zero; and so on. Then write a data loader method to extract for example 128 images for one batch from the first copy and train the CNN and binary classifier 1; then next 128 images would be from dataset copy 2 and classifier 2, and so on.
    NOTE: then I thought for the first batch of 128 images there might not be any positive samples (images of ‘1’). Does this make a problem?

I would love to hear your opinion on this. Also if there are any topics or researches similar to this, I would appreciate it if you could refer me to it.