 How to Create Upper Triangular Matrix with Diagonals of Increasing Power?

I’d like to create a square upper triangular matrix that is defined as follows for some float c and some dimension N:

[[1 , c , c^2, ... c^N],
[0,  1 ,   c, ... c^{N-1}],
[0,  0 ,   1, ... c^{N-2}],
.
.
.
[0,  0 ,   0, ....    1]]

For concreteness, if N=2, then the matrix should be

[[1, c],
[0, 1]]

And if N=3, then the matrix should be:

[[1, c, c^2],
[0, 1,   c],
[0, 0,   1]]

How can I do this?

Right now I’m doing the following. Is there a faster way?

discount_matrix = torch.triu(torch.ones(
size=(N, N)))
for i in range(1, N):
tril_idx = torch.triu_indices(
row=N,
col=N,
offset=i)
discount_matrix[tril_idx, tril_idx] *= c

The Python loop wouldn’t scale that well for larger values of N.
You could use an approach of repeating the initial tensor and calling view on it to create an “interleaved” tensor. This approach would be a bit wasteful in memory, but would yield a better compute performance.

Here is the comparison with a quick profiling for CPU and GPU:

def fun1(N, c, device):
discount_matrix = torch.triu(torch.ones(size=(N, N), device=device))
for i in range(1, N):
tril_idx = torch.triu_indices(
row=N,
col=N,
offset=i)
discount_matrix[tril_idx, tril_idx] *= c
return discount_matrix

def fun2(N, c, device):
tmp = torch.tensor([float(c)**i for i in range(N)], device=device)
tmp = torch.cat((tmp, torch.zeros(1, device=device)))
res = torch.triu(tmp.repeat(N).view(-1, N)[:-1])
return res

c = 2
nb_iters = 100

f1_times = []
f2_times = []

device = 'cuda'

for N in range(1, 100):
# correctness check
print((fun1(N, c, device) == fun2(N, c, device)).all())

# timing
torch.cuda.synchronize()
t0 = time.time()
for _ in range(nb_iters):
out = fun1(N, c, device)
torch.cuda.synchronize()
t1 = time.time()
f1_time = (t1 - t0)/nb_iters
print('fun1({}, {}): {}ms'.format(N, c, f1_time))
f1_times.append(f1_time)

torch.cuda.synchronize()
t0 = time.time()
for _ in range(nb_iters):
out = fun2(N, c, device)
torch.cuda.synchronize()
t1 = time.time()
f2_time = (t1 - t0)/nb_iters
print('fun2({}, {}): {}ms'.format(N, c, f2_time))
f2_times.append(f2_time)

plt.figure()
plt.plot(np.log10(np.array(f1_times)))
plt.plot(np.log10(np.array(f2_times)))

Results for N in range(1, 100). Blue curve: your approach, orange curve: my proposal, y-axis: log10 of the time in ms, x-axis: N (sorry for being too lazy to properly add the legends into the plot ).

CPU: GPU: I just encountered the same problem and wanted to share my approach.

def fun3(N, c, device):
exponents = torch.triu(torch.ones(N, N, device=device), 1).cumsum(-1)
power = torch.float_power(c, exponents)
power_mask = torch.tril(torch.ones(N, N, dtype=torch.bool, device=device), diagonal=-1) 