# Filling block diagonal part with varying size

Given the tensor which specifies the sizees of block matrices, I am trying to fill the block diagonal parts with specific value. For instance, let the size of each block to be `size_tensor`

``````size_tensor = torch.tensor([2,1,3], device='cuda')
``````

The output I want to get looks like this:

``````output =
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1]
``````

As you can see, we fill block diagonal part with size defined in `size_tensor` with specific value (in this case `1`). Note that the size of each block is not same. My question is, what would be the most efficient way to perform this operation?

Currently, my code looks as follows:

``````block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor]
output = torch.block_diag(*block_components)
``````

However, this seems a bit slow when I’m working on GPU device. In my case, the `size_tensor` actually comes from some previous operations on GPU, so it lives on GPU device. Since iterating tensors on GPU is slow, if I change above code to

``````block_components = [torch.full((x,x), 1, device='cuda') for x in size_tensor.to('cpu')]
output = torch.block_diag(*block_components)
``````

The code runs a bit faster. Howver, this code still runs a operation which sends GPU tensor (`size_tensor`) to CPU, so this seems not an optimal way to do it.

So my question is, what would be the most efficient way of creating block diagonal matrix in this situation?