Given the tensor which specifies the sizees of block matrices, I am trying to fill the block diagonal parts with specific value. For instance, let the size of each block to be size_tensor
size_tensor = torch.tensor([2,1,3], device='cuda')
The output I want to get looks like this:
output =
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1]
As you can see, we fill block diagonal part with size defined in size_tensor
with specific value (in this case 1
). Note that the size of each block is not same. My question is, what would be the most efficient way to perform this operation?
Currently, my code looks as follows:
block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor]
output = torch.block_diag(*block_components)
However, this seems a bit slow when I’m working on GPU device. In my case, the size_tensor
actually comes from some previous operations on GPU, so it lives on GPU device. Since iterating tensors on GPU is slow, if I change above code to
block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor.to('cpu')]
output = torch.block_diag(*block_components)
The code runs a bit faster. Howver, this code still runs a operation which sends GPU tensor (size_tensor
) to CPU, so this seems not an optimal way to do it.
So my question is, what would be the most efficient way of creating block diagonal matrix in this situation?