Given the tensor which specifies the sizees of block matrices, I am trying to fill the block diagonal parts with specific value. For instance, let the size of each block to be
size_tensor = torch.tensor([2,1,3], device='cuda')
The output I want to get looks like this:
output = [1, 1, 0, 0, 0, 0], [1, 1, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1]
As you can see, we fill block diagonal part with size defined in
size_tensor with specific value (in this case
1). Note that the size of each block is not same. My question is, what would be the most efficient way to perform this operation?
Currently, my code looks as follows:
block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor] output = torch.block_diag(*block_components)
However, this seems a bit slow when I’m working on GPU device. In my case, the
size_tensor actually comes from some previous operations on GPU, so it lives on GPU device. Since iterating tensors on GPU is slow, if I change above code to
block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor.to('cpu')] output = torch.block_diag(*block_components)
The code runs a bit faster. Howver, this code still runs a operation which sends GPU tensor (
size_tensor) to CPU, so this seems not an optimal way to do it.
So my question is, what would be the most efficient way of creating block diagonal matrix in this situation?