Hello,

**Question 1)**

If I have a **Tensor** of size **(A, B, C)** and a **mask** of size **(A, B)**. The mask only has values of 1 or 0 which indicates whether we should use an example of the dimension B of my tensor.

In practice, I would like to select all (A, B, C) examples of my **Tensor** that have the respective mask values of 1 and ignore the rest. After that, I would like to take the mean of this (A, B, C) with respect to dim=1, which I can do through torch.mean(tensor, dim=1).

Example:

embeddings.size() -> torch.Size([2, 3, 1024])

mask.size() -> torch.Size([2, 3])

mask -> tensor([[1, 1, 0], [1, 1, 1]])

I would like to have for the A = 0: embeddings[0, 0:2, :] | and for A = 1: embeddings[0, :, :], and take, after that, the mean over the dim=1.

OBS: I know I can do it with for loops, but I want to know a cleaner and more efficient way to do this.

**Question 2)**

All this calculation goes under the `forward()`

function of my model. All this slicing and mean go to the grad_fn function, is that right? Could it be influencing all the backpropagation in a wrong way? For a concrete example, calculating the mean in my application is taking the mean of word embeddings to get a sentence embedding.

Thanks!