# Select data through a mask

Hello,

Question 1)

If I have a Tensor of size (A, B, C) and a mask of size (A, B). The mask only has values of 1 or 0 which indicates whether we should use an example of the dimension B of my tensor.

In practice, I would like to select all (A, B, C) examples of my Tensor that have the respective mask values of 1 and ignore the rest. After that, I would like to take the mean of this (A, B, C) with respect to dim=1, which I can do through torch.mean(tensor, dim=1).

Example:

embeddings.size() -> torch.Size([2, 3, 1024])
mask -> tensor([[1, 1, 0], [1, 1, 1]])

I would like to have for the A = 0: embeddings[0, 0:2, :] | and for A = 1: embeddings[0, :, :], and take, after that, the mean over the dim=1.

OBS: I know I can do it with for loops, but I want to know a cleaner and more efficient way to do this.

Question 2)

All this calculation goes under the `forward()` function of my model. All this slicing and mean go to the grad_fn function, is that right? Could it be influencing all the backpropagation in a wrong way? For a concrete example, calculating the mean in my application is taking the mean of word embeddings to get a sentence embedding.

Thanks!

1 Like
``````mask = mask.unsqueeze(-1)
``````

Does this fit your use case?

AFAIK, slicing and mean will work with autograd.

1 Like

Hello @MariosOreo, first, thanks for the answer.

It works, but in the `mask_embeddings.mean(dim=1)` you gonna consider the vectors in the B dimension that are 0 in the mask for the mean… Although they don’t contribute for the sum, it is going to contribute as a value for the division part of the mean, right?

In the example I gave:

embeddings.size() -> torch.Size([2, 3, 1024])
mask -> tensor([[1, 1, 0], [1, 1, 1]])

The example 0 of the batch is going to have a size of `torch.Size([2, 1024])`. Whereas the example 1 of the batch `torch.Size([3, 1024])`, and I would like to take the mean for the example 0 over the 2 word embeddings, and over the 3 word embeddings for the example 1.

1 Like

In your use case, I am afraid that `torch.mean()` cannot fit our requirement.
You could have a try on this:

``````sum_mask_embeddings = mask_embeddings.sum(dim=1)
``````

If I don’t want this mean calculation to take part of the autograd process, can I wrap your code with `torch.no_grad()`?

``````with torch.no_grad():
Yes, you are right. If you don’t want take this part into autograd process, you can wrap it into `torch.no_grad()` scope.
And we often use `torch.no_grad()` to avoid some error (i.e. modify the parameters of the network inplace). For whether it can have beneficial impacts on your embeddings, I think it depends on your use case.