How to conveniently combine elements of a tensor into a new tensor?

Let’s say that given a tensor of length 3 with requires_grad=True, I want to manually create a 3x3 skew-symmetric matrix for that tensor.

As a PyTorch newbie, this is what I would expect should work:

def variant_1(x):
    skew_symmetric_mat = torch.tensor([
        [0, -x[2], x[1]],
        [x[2], 0.0, -x[0]],
        [-x[1], x[0], 0.0]
    ])
    return skew_symmetric_mat

vec = torch.rand(3, requires_grad=True)
variant_1(vec).backward(torch.ones(3, 3))  # RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

However, variant_1 fails with the runtime error mentioned in the snippet. I guess the underlying problem is that creating the tensor skew_symmetric_mat and populating it with elements of the other tensor x is not a differentiable operation and when initializing skew_symmetric_mat, only the values of x get copied into skew_symmetric_mat, so there is no backward graph being created that would reference the elements of the tensor x.

With this assumption in mind, I was able to write a fully working function variant_2(), which doesn’t assign the element of x to skew_symmetric_mat, but rather multiplies the elements of x by new tensors, which most probably results in proper creation of DAG:

def variant_2(x):
    skew_symmetric_mat = torch.zeros(3, 3)

    skew_symmetric_mat += x[0] * torch.tensor([
        [0, 0, 0],
        [0, 0, -1.0],
        [0, 1.0, 0]
    ])

    skew_symmetric_mat += x[1] * torch.tensor([
        [0, 0, 1.0],
        [0, 0, 0],
        [-1.0, 0, 0.0],
    ])

    skew_symmetric_mat += x[2] * torch.tensor([
        [0, -1.0, 0],
        [1.0, 0, 0],
        [0, 0, 0.0],
    ])

    return skew_symmetric_mat

vec = torch.rand(3, requires_grad=True)
variant_2(vec).backward(torch.ones(3, 3))  # Computes `vec.grad` just fine

My question is: variant_2 seems a bit too much verbose to my liking and it seem like it’s computationally wasteful too. Surely, there must be a way how to write a code that is as compact as variant_1() while also being computationally efficient. How would you go about writing this?

P.S. Apologies for such a trivial question. I couldn’t even find the right terminology to Google a solution.

If you are open to use external libraries, pl. try using kornia’s vector_to_skew_symmetric_matrix() for this conversion.

https://kornia.readthedocs.io/en/latest/geometry.conversions.html#kornia.geometry.conversions.vector_to_skew_symmetric_matrix

Thank you for pointing me to the library. My end goal was not necessarily to create a skew-symmetric matrix, but to learn what is the best practice in general of how to construct an intermediary tensor consisting of elements of other tensor(s), which could then be used during computation of a complicated expression.

Thanks to your pointer, I was able to look up the source code of vector_to_skew_symmetric_matrix(), which uses torch.stack() to construct the skew-symmetric matrix. I assume that torch.stack() is autograd-compatible (i.e. differentiable) alternative to creating tensors via torch.tensor(). Based on that, I created variant_3().

May I ask if this is really the most performant and the “best practice” way of how one would go about implementing such operation? I assume that Kornia devs wouldn’t implement a poor-performing technique into the library, so their approach should be the best way to go about implementing skew-symmetric matrix, right?

def variant_3(x):
    zero = torch.tensor(0.0)

    skew_symmetric_mat = torch.stack([
        torch.stack([zero, -x[2], x[1]]),
        torch.stack([x[2], zero, -x[0]]),
        torch.stack([-x[1], x[0], zero])
    ])

    return skew_symmetric_mat

I thnk the main issue is that the gradients are not tracked when constructing a tensor using
torch.tensor(), torch.Tensor() from individual elements of another tensor.
whereas, torch.stack() is keeping the computation graph intact and able to backpropagate.

In my understanding, torch.tensor(), torch.Tensor() are meant to be constructors to create a tensor and they are not meant to keep the computation graph intact. So I would avoid using them in-between end-to-end network operations when there is a need to backpropagate gradients.

Just wanted to document other quirks when using torch.tensor(), torch.Tensor() about sharing memories between source tensor and created tensor.

Creation of tensor from individual elements of source tensor

  • torch.tensor(), torch.Tensor(): When creating tensor using individual elements of other tensors, gradient is not tracked.
src = torch.randn(3, requires_grad=True)
clone = torch.tensor([src[0], src[1], src[2]])
clone.sum().backward()
# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

src = torch.randn(3, requires_grad=True)
clone = torch.Tensor([src[0], src[1], src[2]])
clone.sum().backward()
# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

torch.tensor(source_tensor)

  • torch.tensor() always clones the memory and does not track gradients.
src = torch.randn(3, requires_grad=True)
clone = torch.tensor(src)
clone.sum().backward()
src.grad
# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

clone[1] = 5.0
print(f"src {src}, clone {clone}")
# src tensor([ 0.3578,  2.4084, -1.3494], requires_grad=True), clone tensor([ 0.3578,  5.0000, -1.3494])

torch.Tensor(source_tensor)

  • When using with a source tensor as a whole, torch.Tensor() just creates an alias to source tensor and shares the memory with source tensor.
src = torch.randn(3)
clone = torch.Tensor(src)
print(f"src {src}, clone {clone}")
# src tensor([-0.7193, -2.2648,  1.5096]), clone tensor([-0.7193, -2.2648,  1.5096])

clone[1] = 5.0
print(f"src {src}, clone {clone}")
# src tensor([-0.7193,  5.0000,  1.5096]), clone tensor([-0.7193,  5.0000,  1.5096])

src.requires_grad = True
print(f"src {src}, clone {clone}")
# src tensor([-0.7193,  5.0000,  1.5096], requires_grad=True), clone tensor([-0.7193,  5.0000,  1.5096], grad_fn=<AsStridedBackward0>)

clone.sum().backward()
src.grad
print(src.grad)
# tensor([1., 1., 1.])
# 
1 Like

Thank you for pointing out the quirks, this is going to save me a lot of debugging time :slight_smile:.

Hi Rou!

For your particular example of generating a skew-symmetric
matrix from a (requires_grad = True) vector of its independent
components, it will likely to be fastest to compute the matrix, as
in your variant_2(), but to package the “basis” matrices of the
skew-symmetric matrix more efficiently:

>>> import torch
>>> print (torch.__version__)
2.4.1
>>>
>>> skew_components =  torch.tensor ([
...     [[0, 0, 0],
...      [0, 0, -1.0],
...      [0, 1.0, 0]],
...     [[0, 0, 1.0],
...      [0, 0, 0],
...      [-1.0, 0, 0.0]],
...     [[0, -1.0, 0],
...      [1.0, 0, 0],
...      [0, 0, 0.0]]
... ])
>>>
>>> vec = torch.randn (3, requires_grad = True)
>>>
>>> skew_mat = skew_components @ vec
>>>
>>> skew_mat.backward (torch.ones (3, 3))
>>>
>>> vec.grad   # zeros are expected
tensor([0., 0., 0.])

More generally, when you are generating a derived tensor from
a (requires_grad = True) initial tensor, whether it will be more
efficient to “assemble” (e.g., things like stack()) or compute
the derived tensor will depend on the details. But when the
computation can be packaged as a single tensor operation
(such as skew_components @ vec), that will likely be the
fastest way to go. (After all, that’s what pytorch is optimized
for.)

Best.

K. Frank