Plagiarism Detection

How to build Plagiarism detection AI model when you are not injecting data initially but you are injecting during the process to understand the pattern how the content person is following the pattern

  1. Can I use Passive Aggressive Classifier algorithm
  2. Do tokenization, stopwords should be used
  3. Is there necessary to include cosine similarity