Have a look at this post.
In this topic the user was dealing with different activities performed by different persons.
Using a custom sampler, you could use a sliding window approach by providing the “invalid” frame indices, where the window should not grab images from.
Would the code example be suitable as a starter code?
PS: Tagging certain people might discourage others to answer in your thread.