Training a model to distinguish licenses from non-licenses

Hello folks!

I am building an app that requires users to submit a picture of a license as part of their registration. This is actually a form of racing license, not a drivers license, but they appear somewhat similar. Eventually I’m hoping to train a model that identifies what state the license is from, but I don’t yet have nearly enough images of these specific licenses to do so. As a first attempt, I instead am planning to train a model that simply distinguishes license-like images from other images. I am familiar with the theory underpinning machine learning, and am reasonably comfortable with the syntax of pytorch. However, there are some more practical questions that I don’t have answers to:

  1. What would be a good model architecture for this
  2. Will transfer learning be helpful or should I train a model from scratch
  3. Roughly how many images of license-like and not-license-like things should I gather
  4. How much will image augmentation help reduce the required number of images
  5. What ratio of license and not-license should be in the training data
  6. What should not-license images look like? I know that this depends on what phony images users are going to submit, but should I restrict not-licenses to be document-like images, or include images of anything and everything?

I greatly appreciate any help on this, it is going to be a fun project!