Class activation mapping for videos

Does anyone know how to do class activation mapping for a video (during classification for action detection)? I’m using resnet3d50 architecture as pre-trained model.
Thanks!

Hi there, I saw your question. I also want to do this . Did you find a method for this? Have you tried running gradcam on an existing video classification network, does that work. I was wondering if you ran framewise sliding windows of a video through pytorch video and were able to get a class activation map for even one representative frame in a window. We can then stack all the maps and get a spatiotemporal blob. Do you think this will work?