What's the right way to take a 4D tensor (F, W, H, C) and convert it to (F*C, W, H)?

Consider an output of a convolution which returns a tensor with F filters where each filter is (W, H, C) tensor (width, height, channels).

Is there a simple way to “unpack” the channels so that there are F * C grayscale filters? In other words, converting a 4D tensor of shape (F, W, H, C) to (F*C, W, H, 1) or (F*C, W, H) respectively, such that it gets sliced among the last dimension and stacked in the first?

The output of a convolution will have the following dimensions: [batch_size, number_of_kernels, w, h].
I think you would like to see the kernels, which have the dimensions: [number_of_kernels, input_channels, kernel_width, kernel_height].

Here is a small example:

conv = nn.Conv2d(in_channels=3, 
                 out_channels=6, 
                 kernel_size=5)
x = Variable(torch.randn(1, 3, 24, 24))

output = conv(x)
print(output.shape)
print(conv.weight.data.shape)

conv_ = conv.weight.data.view(-1, 5, 5)

import matplotlib.pyplot as plt
plt.imshow(conv_[0, ...])

How did you get the tensor with [F, W, H, C]?

1 Like

Oops you’re right, my first sentence is actually not what I’ve been doing, I’m not dealing with outputs of a convolution, but rather visualizing convolution filters from pre-trained networks.

The filters actually are [num_kernels, num_channels, width, height]

> torchvision.models.alexnet(pretrained=True).features[0].weight.shape
torch.Size([64, 3, 11, 11])

but through my own process of visualizing this in matplotlib I ended up basically doing

> weights = torchvision.models.alexnet(pretrained=True).features[0].weight
> weights.transpose(1,2).transpose(2,3).shape
torch.Size([64, 11, 11, 3])

and then ended up trying to reshape the [64, 11, 11, 3] into [64 * 3, 11, 11] without doing the transpose the other way around, but I guess now that I’ve written this it makes no sense to do that since the filters are already stored as [64, 3, 11, 11] so I can just do as you showed weights.view(-1, 11, 11).

To answer the original question in case anyone is interested… Use permute to change the order of the dimensions around, then view.

tensor.permute(0, 3, 1, 2).view(F*C, H, W)
1 Like

Sweet, didn’t know about .permute!

As a somewhat related question, any idea if there is a way of doing data.view(F*C, H, W) without having to specify the remaining dimensions? Something like data.flatten(0, 1) which would be equivalent to data.reshape(-1, *data.shape[2:])?

I suppose you could do

tensor.view(-1, *tensor.size()[2:])

I don’t know of any other possibilities.