How can I classify words spoken by one person?

How can I classify words spoken by one person?
For example, subjects voice classify dogs and cats by voice.

I would recommend the speech commands tutorial as a starting point:

https://pytorch.org/tutorials/intermediate/speech_command_recognition_with_torchaudio.html

Best regards

Thomas