How to concatenate 2D tensor with 3D tensor without changing the content

I have two tensors A and B of the following sizes

tensor_A = [16, 100]
tensor_B = [16, 3, 64, 64]

I want to concatenate them on the tensor_B’s channel axis (i.e. 3). First I reshaped tensor_A to have 4 dimensions and then multiplied it with torch.ones() tensor with the desired shape of tensor_B with the following code.

tensor_A  = tensor_A .view(16, 100, 1, 1)
tensor_A  = tensor_A  * torch.ones(16, 100, 64, 64, device='cuda')
concat = torch.cat((tensor_A  , tensor_B ), dim=1)

I am able to concatenate the two tensors but I am not sure whether

  1. It is the right way to do or does there exist a better solution?
  2. By doing so does the meaning of the 2D tensor_A change?

What is the meaning of tensor_A?

I can assume that tensor_B is a batch of 64x64 RGB images. Your operation creates a batch of 100 channel 64x64 images (where these 64x64 pixels all have the same value) and adds those channels to each of the tensor_B images.

A more efficient operation would be to change:

tensor_A = tensor_A.view(16, 100, 1, 1)
tensor_A = tensor_A * torch.ones(16, 100, 64, 64, device=‘cuda’)

to:

tensor_A = tensor_A.view(16, 100, 1, 1).expand(16, 100, 64, 64)

tensor_A is a batch of 100-dimensional class features extracted from an encoder. I want to concatenate the batch of features (tensor_A) and the batch of images (tensor_B).

Then this method might work alright.

A problem in the original solution could have been that the output tensor called concat would very large since tensor_A would be 64x64=4096 times larger than what it needed to be. Using only .view() and .expand() will probably fix this problem as the 4096 repeated values are not physical but are just references to the same memory addresses.

Another idea could be to concatenate these inputs later in a neural net. I.e. if you are using a convolutional neural network for your images you can have an encoder part that extract information to a linear layer, at which point you add in these 1 dimensional features from tensor_B before subsequent linear layers.

How about if I do the following operation to replicate spatially and concatenate class information.

tensor_A = tensor_A.view(tensor_A .size(0), tensor_A.size(1), 1, 1)
tensor_A = tensor_A.repeat(1, 1, tensor_B.size(2), tensor_B.size(3))
concat = torch.cat([tensor_A , tensor_B], dim=1)

Which one of the two solutions (the one you proposed and this) seems more logical considering that I want to be computationally efficient and I want to concatenate the extracted class information (tensor_A) into tensor_B.

torch.expand() is more efficient than torch.repeat() memory wise