Finding the exact moment of firearm shot

I have a video of a man shots with a firearm. I want to detect the exact moments (in seconds) that the man shots using the video, and of course the audio. An important thing to know is that this man is not alone. There are more people that also shot with firearms, but the camera records him (maybe some other people are also recorded but the camera points on him).
Does anyone have any idea how can I perform this task?

