Select data through a mask

Hello,

Question 1)

If I have a Tensor of size (A, B, C) and a mask of size (A, B). The mask only has values of 1 or 0 which indicates whether we should use an example of the dimension B of my tensor.

In practice, I would like to select all (A, B, C) examples of my Tensor that have the respective mask values of 1 and ignore the rest. After that, I would like to take the mean of this (A, B, C) with respect to dim=1, which I can do through torch.mean(tensor, dim=1).

Example:

embeddings.size() -> torch.Size([2, 3, 1024])
mask.size() -> torch.Size([2, 3])
mask -> tensor([[1, 1, 0], [1, 1, 1]])

I would like to have for the A = 0: embeddings[0, 0:2, :] | and for A = 1: embeddings[0, :, :], and take, after that, the mean over the dim=1.

OBS: I know I can do it with for loops, but I want to know a cleaner and more efficient way to do this.

Question 2)

All this calculation goes under the forward() function of my model. All this slicing and mean go to the grad_fn function, is that right? Could it be influencing all the backpropagation in a wrong way? For a concrete example, calculating the mean in my application is taking the mean of word embeddings to get a sentence embedding.

Thanks!

1 Like
mask = mask.unsqueeze(-1)
mask_embeddings = embeddings * mask.float()
result = mask_embeddings.mean(dim=1)

Does this fit your use case?

AFAIK, slicing and mean will work with autograd.

1 Like

Hello @MariosOreo, first, thanks for the answer.

It works, but in the mask_embeddings.mean(dim=1) you gonna consider the vectors in the B dimension that are 0 in the mask for the mean… Although they don’t contribute for the sum, it is going to contribute as a value for the division part of the mean, right?

In the example I gave:

embeddings.size() → torch.Size([2, 3, 1024])
mask.size() → torch.Size([2, 3])
mask → tensor([[1, 1, 0], [1, 1, 1]])

The example 0 of the batch is going to have a size of torch.Size([2, 1024]). Whereas the example 1 of the batch torch.Size([3, 1024]), and I would like to take the mean for the example 0 over the 2 word embeddings, and over the 3 word embeddings for the example 1.

1 Like

In your use case, I am afraid that torch.mean() cannot fit our requirement.
You could have a try on this:

sum_mask_embeddings = mask_embeddings.sum(dim=1)
for dim in range(mask.size(0)):
    average_mask_embeddings[dim] = sum_mask_embeddings[dim] / mask[dim].sum()

If I don’t want this mean calculation to take part of the autograd process, can I wrap your code with torch.no_grad()?

with torch.no_grad():
    sum_mask_embeddings = mask_embeddings.sum(dim=1) 
    for dim in range(mask.size(0)):
        average_mask_embeddings[dim] = sum_mask_embeddings[dim] / mask[dim].sum()

Does this makes sense? Do you think it can have a beneficial impact on my embeddings?

Yes, you are right. If you don’t want take this part into autograd process, you can wrap it into torch.no_grad() scope.

And we often use torch.no_grad() to avoid some error (i.e. modify the parameters of the network inplace). For whether it can have beneficial impacts on your embeddings, I think it depends on your use case.

1 Like