Search architecture to find objects in the image

Hi. I’m making a tool to find detached faces in images. I plan that it will work individually for each project, that is, if there is a need to find such faces, the user will have to mark a small number of frames, and the rest of the frames will find the neural network itself after training. An image is fed to the input, the output is 4 coordinates. Now I am faced with the question of neural network architecture. I am a beginner, I have looked through several available courses and tutorials on PyTorch website. From these tutorials I have written myself a few simple neural networks and they work, but very badly and not accurately. I’ve tried to search already available architectures, but what I found is very complicated and very far from what pyTorch site teaches (Yolox for example). I’m asking for help in finding an architecture that will fit my needs, it doesn’t have to match the inputs and outputs exactly, those parameters I can edit, but it has to have a standard architecture, otherwise I won’t be able to edit and use it:

class NeuralNetwork(nn.Module):
    def __init__(self):
        ...

    def forward(self, x):
        ...