How to predict the label/probability for an input of multiple similar views of an object with a sharing weights model?

I want to do a binary classification on an particular object of which I have n images of it at different angles. I want to know if the object contains one feature or not, and containing this feature is dominant in the classification step. In other words, if I see that feature in at least one of the n views, the complete model should classify the input as “containing the feature”. I have one label for all n images (angles of view). The object images taken respect a certain symmetry, such that each views could be interchanged without affecting the output. For example, my object could be a cylinder and the views are obtained when we rotate around its center axis, thus all n views looks similar.

To classify, I want to do the following:

  1. Use the same model for each view, (sharing weights)
  2. and to terminate via special voting along the n outputs of the same model.

Step 1 is to be reasonable since all my objects view are similar.

Now, I don’t know what is the most efficient way for the voting part. I guess I should not learn it since I know the voting conditions. But is it better to output a label out of voting or is it possible to output some values for later binary cross-entropy?

I thank you in advance for your help.

A better way to solve this might be using MultiView CNN network. It takes multiple images from different views of the same object, passes them through the same network, just like you want in point 1. And then gives the final output. It might help you in your use case for sure.

Thank you, that is interesting and I will test it.