Replace diagonal elements with vector

square_matrix = Variable(torch.zeros(10,10)) 
# the dim-1 indices to put the diagonals 
index = Variable(torch.arange(10).long().unsqueeze(1))  
# the diagonal, unsqueeze it
diagonal = Variable(torch.randn(10).unsqueeze(1)) 
# use scatter 
square_matrix = square_matrix.scatter(1, index, diagonal) 
1 Like

Quick question guys: If I want to create N variables of these square matrices, what is the most proper way to do it? e.g.:

L_j = Variable(torch.cuda.FloatTensor(np.zeros([number_of_tensors, dimension, dimension])))
for tensor in range(0, number_of_tensors):
L_j[tensor] = generate_square_matrix_according_to_iclementine(dimension)

Will that hold?

Actually, if the N square matrices are of the same size, you can also use scatter to do this kind of batch-assignment.

# (N, K, K)
matrix = Variable(torch.zeros(5, 10, 10)) 
# dim-2 indices to scatter into, (N, K, 1) 
index = Variable(torch.arange(10).unsqueeze(0).expand(5, -1).unsqueeze(2)).long()  
# diagonals to scatter, (N, K, 1)
src = Variable(torch.randn(5,10, 1)) 
# 2 is the dimension to scatter
matrix = matrix.scatter(2, index, src) 

Thank you. Donā€™t mind me asking again, but this operation will not affect the backward pass in the created graph, right?

This is what Iā€™ve trying recently, and I finds that:

  1. if a Variable is a leaf node, (created by user) and requires _grad is True, do not operate in-place on that Variable. (scatter_ is in-place, while scatter is not).
    (sorry for mis-understanding before, leaf_node does not require requires_grad to be True)

  2. if a Variable is is a leaf node, (created by user) and does not require gradient, in-place operations do no harm. (I am not quite sure)

  3. if a Variable is not created by user and requires gradient (e.g. it is return by an nn.Linear() layer), in-place operation is still okay. The gradient to the weight and bias in the nn.Linear wonā€™t go wrong.

1 Like

an example

import torch
from torch import nn
from torch.autograd import Variable
import numpy as np

input = Variable(torch.randn(20, 64))
label = Variable(torch.from_numpy(np.random.randint(0,64,size=(20))))

class Composition(nn.Module):
    def __init__(self, dim=64):
        super().__init__()
        self.proj = nn.Linear(2 * dim, dim)
    
    def forward(self, head, dependent):
        return self.proj(torch.cat((head, dependent), dim=-1))

composition = Composition(64)
for i in range(1, input.size(0)):
    input[i] = composition(input[i-1], input[i])

ce = nn.CrossEntropyLoss()
loss = ce(input, label)

loss.backward(retain_graph=True) 
# Attention: retain_graph allows us to backward again when all the computation 
# that uses a node has been backwarded
a0 = composition.proj.weight.grad.data 

# zero the gradient 
composition.zero_grad()

composed = torch.zeros_like(input)
composed[0] = input[0]

for i in range(1, input.size(0)):
    composed[i] = composition(input[i-1], input[i])

loss2 = ce(composed, label)
loss2.backward()

a1 = composition.proj.weight.grad.data

assert a0 == a1

3 in the last post in tested by this.

1 Like

what do you mean by this?

no ho, the underscore is not displayed. In pytorch, function whose name ends with an underscore means it is an in-place operation. (corrected that)

What about if you want to do the implementation with Parameters instead of Variables?

> # (N, K, K)
> matrix = Variable(torch.zeros(5, 10, 10)) 
> # dim-2 indices to scatter into, (N, K, 1) 
> index = Variable(torch.arange(10).unsqueeze(0).expand(5, -1).unsqueeze(2)).long()  
> # diagonals to scatter, (N, K, 1)
> src = Variable(torch.randn(5,10, 1)) 
> # 2 is the dimension to scatter
> matrix = matrix.scatter(2, index, src)

Parameter is a sub-class of Variable which requires grad by default.
If you create a Parameter via:

param = Parameter(Tensor)

It is the case of leaf node and requires grad. So inplace operation is not recommended. If you just want to init the Parameter is a specific way, you can create the Tensor first, operate on it, and then create a Parameter with the tensor as its init value.

If the Parameter is already created, and some subsequent operations depending on it have taken place. Then inplace operation on this Parameter will affect the backward pass.

Yes but in this case the Tensor does not have the ā€œscatterā€ attrribute

Oh, yes. Itā€™s wierd why tensor does not have scatter attribute.

Thatā€™s why Iā€™m so confused with this very simple operation. In Numpy is soooo easy, but here itā€™s very difficult to do the same without destroying the graph.

Iā€™ve found that though Tensor does not have scatter method, it does have a scatter_ method. You can use that.

What do you mean by ā€œwithout destroying the graph.ā€ ?
Doing this efficiently while keeping the gradients that correspond to what you computed can be done using scatter_ or advanced indexing as discussed above.
From your original comment, it seems that you donā€™t actually want to compute the gradients corresponding to what you computed in the forward pass (because you donā€™t want this change to be recorded). This means that you actually want to ā€œbreak the graphā€ by not recording some operation. To do this efficiently, You can as well use scatter_ or advanced indexing while adding extra .data to explicitly no record these operations.

I meant not being able to compute the gradients. I managed to do some investigation in the case where we need Parameters instead of simple Tensors and the following seems to work in the multi-dimensional case.

    # (N, K, K)
        temp = torch.randn(N, K, K).cuda()
        for i in range(0, N):
            temp [i] = torch.tril(temp[i])
        temp = Parameter(temp, requires_grad=True)
        # dim-2 indices to scatter into, (N, K, 1)
        index = Variable(torch.arange(K).unsqueeze(0).expand(N, -1).unsqueeze(2).cuda(), requires_grad=True).long()
        # diagonals to scatter, (N, K, 1)
        src = Variable(torch.exp(torch.randn(N, K, 1)).cuda(), requires_grad=True)
        # 2 is the dimension to scatter
        final_parameter = Parameter(temp.data, requires_grad=True).scatter(2, index, src)

Guys, all the solutions proposed do not enforce the diagonal to be positive during optimization. Any ideas?

@SimonW Hi Simon, I tried your code in PyTorch 0.4.0 after upgrading and for k = 5, Iā€™m getting the following error:

RuntimeError: expand(torch.cuda.FloatTensor{[1, 5]}, size=[5]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

are you sure that the vector you copy from is 1d?

@SimonW Hi Simon, it works fine Sorry for that.

But will this operation affect the autograd for the vector and for the matrix A ?