Confusion in making training sample data sets

Just starting deep learning, ask a simple question:
YOLOv3 network for target detection and classification;
The position on the training sample label is proportional to the original image.
For example, the size of the original picture is H x W;
The label is: category, x/W, y/H, w_object/W, h_object/W;
If my original picture is too large, the size is 3000 x 2500, and my detection target is very small, the size is 20 x 20.
Here’s the question:
Can I clip the part in the original picture as the training picture?
Is there a requirement for the aspect ratio of the captured image?