How to randomly set a fixed number of elements in each row of a tensor

sisaman · October 1, 2020, 8:17pm

I was wondering if there is any more efficient alternative for the below code, without using the “for” loop in the 4th line?

import torch
n, d = 37700, 7842
k = 4
sample = torch.cat([torch.randperm(d)[:k] for _ in range(n)]).view(n, k)
mask = torch.zeros(n, d, dtype=torch.bool)
mask.scatter_(dim=1, index=sample, value=True)

Basically, what I am trying to do is to create a n x d mask tensor, such that in each row exactly k random elements are True.

sisaman · October 2, 2020, 7:04pm

Here’s a potential solution:

More efficient approaches are welcome.

tom · October 2, 2020, 7:08pm

The obvious thing would be to use torch.multinomial(torch.ones(n, d), 4), which would take out ~1/3 of the time for me but is somewhat slow.
I can half the time (relative to your code, on my machine, on CPU etc.) by using rand + topk.

    sample = torch.rand(n, d).topk(4, dim=1).indices
    mask = torch.zeros(n, d, dtype=torch.bool)
    mask.scatter_(dim=1, index=sample, value=True)

I imagine for k much smaller than d you could be blazingly fast writing your own GPU kernel that just loops until it has found four new things because the 37700 is just one giant opportunity for parallelization on the GPU.

Best regards

Thomas

tom · October 3, 2020, 8:02am

Had I known you went to stackoverflow to get the same answer, I would have not went through the trouble of benchmarking things.
I spent 15 minutes on your problem trying to help you only to find that you would have been OK without.

sisaman · October 3, 2020, 8:43am

Thank you for your answer. Actually, your answer works better for me than the one in StackOverflow both in terms of running time and similarity to my own code, with only one line changed.

But I think I have the right to seek my answer from different sources, right? It’s not against the community guidelines, I suppose. Also, someone else could have posted an answer here (instead of StackOverflow) a few minutes before you, so I don’t think it’s my fault. Anyway, I really appreciate your time, and your answer not only helped me but could also help other people in the future.

Best regards,
Sina

tom · October 3, 2020, 9:39am

Oh, you have every right to ask where ever you want how often you want and so.
I would think that is is common courtesy and sound use of a resource to not ask the same question in multiple places causing many people to invest time helping you when they could have helped the next person instead with much of the same outcome.

I feel I should have done something else with my time in retrospect, but yeah, maybe then I should not be answering questions.

Best regards and good luck with your project

Thomas

sisaman · October 3, 2020, 9:58am

Again, I am so grateful for your time and effort and sorry for any inconvenience I might have caused.

Best,
Sina