How do you use the weights/trained data in an actual custom script from yolov5?

Hey,

Read the getting started guides etc, and got to creating a custom dataset to detect some specific object in a video (mp4), I cloned the repo, found lots of images, and annotated them in yolov5 format, and used the train.py script and got a pretty good weight? file (the one in \yolov5\runs\train\yolov5s_resultsxxx\weights\best.pt), that was pretty accurate when using detect.py on a mp4 file.

Looked at countless blogs, and videos, everyone just shows how to use detect.py, not actual custom python code that uses the files generated by train.py, I even tried to dissect the detect.py file, but it’s really a terrible mess to understand for a torch beginner, does anyone know an example of how to load that best.pt file that was created from train.py, basically I want to run the detection on live video, frame by frame, so can’t use detect.py, the program needs to work on the bounding boxes that is detected, but there doesn’t seem to be any examples of actual use, outside the simple train.py → detect.py, usually done in codelab, which isn’t practical to real world use, really confused how this is supposed to work.

I read the parts about saving/loading state etc, but isn’t that what train.py already did for the model, when i try to load best.py that was generated from train.py it just complains it’s not a valid file to load (despite it working fine with detect.py).

Hope someone can point me in the right direction, on how to use the weights/model that was generated from the yolo5 stuff, docs just make no sense to me, on how to move that trained data into a real world standalone script/app, outside the provided scripts/codelabs.

Hey @Michael_Jensen

Your confusion of a beginner is totally understandable. Could you provide which repo do you try to execute?

Without it I’d say that you need to find in the code following:

  1. a class that implements PyTorch model
  2. understand where is the preprocessing of images is done (inside a model or in some external function)
  3. then write your own script that takes frames, preprocess them and pass it to the model one by one.