I found that Bernoulli sampling can be very slow when implementing mask matrix.
Here is my test code:
and the speed is:
When running on GPU, the difference can be larger. I assumed that Bernoulli does sampling on CPU, while random matrix can be directly created on GPU. But here is still a question, Why the speed difference exists when both running on CPU?
Can someone help me ? Thanks!