Understanding unsqueeze

Let us assume that we want to multiply 2 tensors:

t_a = torch.randn(2, 3, 5, 5) #  2 batch, 3 channels, 5 rows, 5 cols
t_b = torch.tensor([0.26, 0.28, 0.45]) # 1D tensor

In the problem I am reading about they use unsqueeze and unsqueeze in-place before multiplying the tensors as such:

t_c = t_b.unsqueeze(-1).unsqueeze_(-1)  # first expand dims

result = t_a * t_c # then multiply

I assume we are expanding the tensor t_b to be the same in at least one dimension as t_a, for the purpose of broadcasting. So t_c should end up being what shape in the end?

What does t_b.unsqueeze(-1) do? Where does it add a dimension?

Why are we using unsqueeze_(-1) in-place here? Is this because the first unsqueeze creates a new object in memory and the second one is just modifying the second object instead of creating a third?

Thanks for the help!

unsqueeze(-1) adds a new dimension after the last index, so the shape of t_b transforms into:

t_b.unsqueeze(-1).unsqueeze(-1)
 (3)  (3,1)        (3,1,1)      <- shapes

For comparison, if we wanted to add the new dimensions in “front” we would do:

t_b.unsqueeze(0).unsqueeze(0)
 (3)  (1,3)        (1,1,3)      <- shapes

Chaining two unsqeeze calls (or any Python method for that matter) together is equivalent to

t_b = t_b.unsqueeze(-1)
t_b = t_b.unsqueeze(-1)
1 Like

Thanks.

So in this scenario we would be broadcasting 0.26 to the first channel, 0.28 to the second, and 0.45 to the third channel?

Why don’t we need to add a 4th dimension, in-front, for the batch?

Lastly, would t_b = t_b[None] do the same as t_b = t_b.unsqueeze(-1)?

I suggest taking a look at the documentation about broadcasting semantics.

According to the docs:

Two tensors are “broadcastable” if the following rules hold:

  1. Each tensor has at least one dimension.
  2. When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

In this case we have the shapes (2,3,5,5) and (3,1,1). The trailing dimension is the leftmost apparently.

Valid according to point 2, since "one of them does not exist"
v
(2,3,5,5)
  (3,1,1)

My intuition is that when you match the channels, it will distribute each number in t_b for each channel. Since t_b does not have a fourth dimension, the process simply gets repeated for each “item” in the fourth dimension.

I would suggest you play around with tensors that are easier to sanity check (print the things and see what happens), e.g. play around with this code:

a = torch.ones(3, 4, 4) 
b = torch.tensor([1, 2, 3]) 
print(a*b.unsqueeze(-1).unsqueeze(-1))

a = torch.ones(2, 3, 4, 4) 
b = torch.tensor([1, 2, 3]) 
print(a*b.unsqueeze(-1).unsqueeze(-1))

Lastly, would t_b = t_b[None] do the same as t_b = t_b.unsqueeze(-1) ?

No it would not, it would be equal to t_b.unsqueeze(0). Doing t_b[..., None], would be equivalent to .unsqueeze(-1). Also, t_b[..., None, None] would be equal to t_b.unsqueeze(-1).unsqueeze(-1).

1 Like

Awesome answers. Thanks for taking the time to thoughtfully give me that insight.

1 Like