Global averaging nonzero vectors from 4d feature map

I want to average nonzero vectors by image resolutiom.
For example,
there are feature map x(4, 81, 3, 3) and mask y(4, 1, 3, 3).
I could generate masked feature by x*y.

Now, I want to average only masked vectors with dim 2, 3, so the dimension of output is (4, 81, 1, 1)

I tried it by torch.masked_index, but the result does not match to (batch, channel, -1).

Can I ask you some solutions?