# How to Create Upper Triangular Matrix with Diagonals of Increasing Power?

I’d like to create a square upper triangular matrix that is defined as follows for some float `c` and some dimension `N`:

``````[[1 , c , c^2, ... c^N],
[0,  1 ,   c, ... c^{N-1}],
[0,  0 ,   1, ... c^{N-2}],
.
.
.
[0,  0 ,   0, ....    1]]
``````

For concreteness, if `N=2`, then the matrix should be

``````[[1, c],
[0, 1]]
``````

And if `N=3`, then the matrix should be:

``````
[[1, c, c^2],
[0, 1,   c],
[0, 0,   1]]
``````

How can I do this?

Right now I’m doing the following. Is there a faster way?

``````            discount_matrix = torch.triu(torch.ones(
size=(N, N)))
for i in range(1, N):
tril_idx = torch.triu_indices(
row=N,
col=N,
offset=i)
discount_matrix[tril_idx[0], tril_idx[1]] *= c

``````

The Python loop wouldn’t scale that well for larger values of `N`.
You could use an approach of repeating the initial tensor and calling `view` on it to create an “interleaved” tensor. This approach would be a bit wasteful in memory, but would yield a better compute performance.

Here is the comparison with a quick profiling for CPU and GPU:

``````def fun1(N, c, device):
discount_matrix = torch.triu(torch.ones(size=(N, N), device=device))
for i in range(1, N):
tril_idx = torch.triu_indices(
row=N,
col=N,
offset=i)
discount_matrix[tril_idx[0], tril_idx[1]] *= c
return discount_matrix

def fun2(N, c, device):
tmp = torch.tensor([float(c)**i for i in range(N)], device=device)
tmp = torch.cat((tmp, torch.zeros(1, device=device)))
res = torch.triu(tmp.repeat(N).view(-1, N)[:-1])
return res

c = 2
nb_iters = 100

f1_times = []
f2_times = []

device = 'cuda'

for N in range(1, 100):
# correctness check
print((fun1(N, c, device) == fun2(N, c, device)).all())

# timing
torch.cuda.synchronize()
t0 = time.time()
for _ in range(nb_iters):
out = fun1(N, c, device)
torch.cuda.synchronize()
t1 = time.time()
f1_time = (t1 - t0)/nb_iters
print('fun1({}, {}): {}ms'.format(N, c, f1_time))
f1_times.append(f1_time)

torch.cuda.synchronize()
t0 = time.time()
for _ in range(nb_iters):
out = fun2(N, c, device)
torch.cuda.synchronize()
t1 = time.time()
f2_time = (t1 - t0)/nb_iters
print('fun2({}, {}): {}ms'.format(N, c, f2_time))
f2_times.append(f2_time)

plt.figure()
plt.plot(np.log10(np.array(f1_times)))
plt.plot(np.log10(np.array(f2_times)))
``````

Results `for N in range(1, 100)`. Blue curve: your approach, orange curve: my proposal, y-axis: `log10` of the time in ms, x-axis: N (sorry for being too lazy to properly add the legends into the plot ).

CPU:

GPU:

I just encountered the same problem and wanted to share my approach.

``````def fun3(N, c, device):
exponents = torch.triu(torch.ones(N, N, device=device), 1).cumsum(-1)
power = torch.float_power(c, exponents)
power_mask = torch.tril(torch.ones(N, N, dtype=torch.bool, device=device), diagonal=-1)
``````assert torch.isclose(fun2(N, c, device).double(), fun3(N, c, device)).all()