I’ve responded to your cross-post already and mentioned that I wouldn’t know which approach works best for your use case (here). I would thus recommend avoiding tagging specific users as it might discourage others to post a valid answer.
Generally, you could try to overfit a small dataset (e.g. just 10 samples) and make sure your model is able to learn these samples (almost) perfectly. If that’s not the case your model architecture or overall training could have other errors or might just not be suitable for the task.