I have a 2D tensor and a corresponding 2D boolean mask. The mask is telling me which positions are valid for being randomly masked. All other values in the tensor need to stay the same.
In the next step I want to randomly select 1 to N elements from each row of my tensor and change their values to a “mask value”, with N being the number of valid elements in that row.
I have some code to do this for each row individually, but I wonder if there is an efficient way to do this for all rows at once?
Can the following code be adapted to work with 2D tensors instead of 1D tensors?
(“tensor” is my 1D tensor, “mask” is the corresponding 1D boolean mask of valid positions)
# get indices of True values in mask
mask_idx = mask.nonzero().flatten()
N = mask_idx.numel() # number of valid positions
M = random.randint(1, N) # number of positions to mask
# randomly select M out of N valid positions
idx_perm = torch.randperm(N)[:M]
# change mask to the randomly selected valid positions
mask.fill_(False)
mask[mask_idx[idx_perm]] = True
# mask values in tensor
tensor[mask] = mask_value
In addition, I do need to know the values for N and M for each row. Though those can also be calculated based on the row-sums of the corresponding mask before and after running the previous code.
*) The proviso has to do with the probability of masking a specific
number of elements in a given row.
For example, the index-2 row (the “third” row) of the mask tensor in
my example script has four valid mask locations. In your example
code, the probability of masking three of those locations is the same
as the probability of masking four of those locations (namely 25% in
both cases), while in my version, the probability of masking three
locations is higher that that of masking four locations.
It would depend on your use case as to whether this difference in
probabilities matters.
(As an aside, I suspect that there is a way to get the same probability
for masking any given number of valid mask locations, but I do not
know how to do it. My guess is that – if possible – it would take some
careful and clever work with probability distributions to do it.)