I have a Tensor of shape [n, m, m] ( n images each m*m ). I want to mask each image according to the max value of each row of the image. the final matrix has the same shape as the original one but the [m,m] is now mask matrices. what is the fastest way on GPU to do this?