If I have a Tensor of size (A, B, C) and a mask of size (A, B). The mask only has values of 1 or 0 which indicates whether we should use an example of the dimension B of my tensor.
In practice, I would like to select all (A, B, C) examples of my Tensor that have the respective mask values of 1 and ignore the rest. After that, I would like to take the mean of this (A, B, C) with respect to dim=1, which I can do through torch.mean(tensor, dim=1).
embeddings.size() -> torch.Size([2, 3, 1024])
mask.size() -> torch.Size([2, 3])
mask -> tensor([[1, 1, 0], [1, 1, 1]])
I would like to have for the A = 0: embeddings[0, 0:2, :] | and for A = 1: embeddings[0, :, :], and take, after that, the mean over the dim=1.
OBS: I know I can do it with for loops, but I want to know a cleaner and more efficient way to do this.
All this calculation goes under the
forward() function of my model. All this slicing and mean go to the grad_fn function, is that right? Could it be influencing all the backpropagation in a wrong way? For a concrete example, calculating the mean in my application is taking the mean of word embeddings to get a sentence embedding.