square_matrix = Variable(torch.zeros(10,10))
# the dim-1 indices to put the diagonals
index = Variable(torch.arange(10).long().unsqueeze(1))
# the diagonal, unsqueeze it
diagonal = Variable(torch.randn(10).unsqueeze(1))
# use scatter
square_matrix = square_matrix.scatter(1, index, diagonal)
Quick question guys: If I want to create N variables of these square matrices, what is the most proper way to do it? e.g.:
L_j = Variable(torch.cuda.FloatTensor(np.zeros([number_of_tensors, dimension, dimension])))
for tensor in range(0, number_of_tensors):
L_j[tensor] = generate_square_matrix_according_to_iclementine(dimension)
Will that hold?
Actually, if the N square matrices are of the same size, you can also use scatter to do this kind of batch-assignment.
# (N, K, K)
matrix = Variable(torch.zeros(5, 10, 10))
# dim-2 indices to scatter into, (N, K, 1)
index = Variable(torch.arange(10).unsqueeze(0).expand(5, -1).unsqueeze(2)).long()
# diagonals to scatter, (N, K, 1)
src = Variable(torch.randn(5,10, 1))
# 2 is the dimension to scatter
matrix = matrix.scatter(2, index, src)
Thank you. Donāt mind me asking again, but this operation will not affect the backward pass in the created graph, right?
This is what Iāve trying recently, and I finds that:
-
if a Variable is a leaf node, (created by user) and requires _grad is True, do not operate in-place on that Variable. (scatter_ is in-place, while scatter is not).
(sorry for mis-understanding before, leaf_node does not require requires_grad to be True) -
if a Variable is is a leaf node, (created by user) and does not require gradient, in-place operations do no harm. (I am not quite sure)
-
if a Variable is not created by user and requires gradient (e.g. it is return by an nn.Linear() layer), in-place operation is still okay. The gradient to the weight and bias in the nn.Linear wonāt go wrong.
an example
import torch
from torch import nn
from torch.autograd import Variable
import numpy as np
input = Variable(torch.randn(20, 64))
label = Variable(torch.from_numpy(np.random.randint(0,64,size=(20))))
class Composition(nn.Module):
def __init__(self, dim=64):
super().__init__()
self.proj = nn.Linear(2 * dim, dim)
def forward(self, head, dependent):
return self.proj(torch.cat((head, dependent), dim=-1))
composition = Composition(64)
for i in range(1, input.size(0)):
input[i] = composition(input[i-1], input[i])
ce = nn.CrossEntropyLoss()
loss = ce(input, label)
loss.backward(retain_graph=True)
# Attention: retain_graph allows us to backward again when all the computation
# that uses a node has been backwarded
a0 = composition.proj.weight.grad.data
# zero the gradient
composition.zero_grad()
composed = torch.zeros_like(input)
composed[0] = input[0]
for i in range(1, input.size(0)):
composed[i] = composition(input[i-1], input[i])
loss2 = ce(composed, label)
loss2.backward()
a1 = composition.proj.weight.grad.data
assert a0 == a1
3
in the last post in tested by this.
what do you mean by this?
no ho, the underscore is not displayed. In pytorch, function whose name ends with an underscore means it is an in-place operation. (corrected that)
What about if you want to do the implementation with Parameters instead of Variables?
> # (N, K, K)
> matrix = Variable(torch.zeros(5, 10, 10))
> # dim-2 indices to scatter into, (N, K, 1)
> index = Variable(torch.arange(10).unsqueeze(0).expand(5, -1).unsqueeze(2)).long()
> # diagonals to scatter, (N, K, 1)
> src = Variable(torch.randn(5,10, 1))
> # 2 is the dimension to scatter
> matrix = matrix.scatter(2, index, src)
Parameter is a sub-class of Variable which requires grad by default.
If you create a Parameter via:
param = Parameter(Tensor)
It is the case of leaf node and requires grad. So inplace operation is not recommended. If you just want to init the Parameter is a specific way, you can create the Tensor first, operate on it, and then create a Parameter with the tensor as its init value.
If the Parameter is already created, and some subsequent operations depending on it have taken place. Then inplace operation on this Parameter will affect the backward pass.
Yes but in this case the Tensor does not have the āscatterā attrribute
Oh, yes. Itās wierd why tensor does not have scatter attribute.
Thatās why Iām so confused with this very simple operation. In Numpy is soooo easy, but here itās very difficult to do the same without destroying the graph.
Iāve found that though Tensor does not have scatter method, it does have a scatter_ method. You can use that.
What do you mean by āwithout destroying the graph.ā ?
Doing this efficiently while keeping the gradients that correspond to what you computed can be done using scatter_
or advanced indexing as discussed above.
From your original comment, it seems that you donāt actually want to compute the gradients corresponding to what you computed in the forward pass (because you donāt want this change to be recorded). This means that you actually want to ābreak the graphā by not recording some operation. To do this efficiently, You can as well use scatter_
or advanced indexing while adding extra .data
to explicitly no record these operations.
I meant not being able to compute the gradients. I managed to do some investigation in the case where we need Parameters instead of simple Tensors and the following seems to work in the multi-dimensional case.
# (N, K, K)
temp = torch.randn(N, K, K).cuda()
for i in range(0, N):
temp [i] = torch.tril(temp[i])
temp = Parameter(temp, requires_grad=True)
# dim-2 indices to scatter into, (N, K, 1)
index = Variable(torch.arange(K).unsqueeze(0).expand(N, -1).unsqueeze(2).cuda(), requires_grad=True).long()
# diagonals to scatter, (N, K, 1)
src = Variable(torch.exp(torch.randn(N, K, 1)).cuda(), requires_grad=True)
# 2 is the dimension to scatter
final_parameter = Parameter(temp.data, requires_grad=True).scatter(2, index, src)
Guys, all the solutions proposed do not enforce the diagonal to be positive during optimization. Any ideas?
@SimonW Hi Simon, I tried your code in PyTorch 0.4.0 after upgrading and for k = 5, Iām getting the following error:
RuntimeError: expand(torch.cuda.FloatTensor{[1, 5]}, size=[5]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)
are you sure that the vector you copy from is 1d?
@SimonW Hi Simon, it works fine Sorry for that.
But will this operation affect the autograd for the vector and for the matrix A ?