Extracting reduced dimension data from autoencoder in pytorch


I have defined my autoencoder in pytorch as following:

        self.encoder = nn.Sequential(
            nn.Conv2d(input_shape[0], 32, kernel_size=1, stride=1),
            nn.Conv2d(32, 64, kernel_size=1, stride=1),
            nn.Conv2d(64, 64, kernel_size=1, stride=1),

        self.decoder = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=1, stride=1),
            nn.Conv2d(64, 32, kernel_size=1, stride=1),
            nn.Conv2d(32, input_shape[0], kernel_size=1, stride=1),

Everything works fine, I can train it.

But, what I need is to get a reduced dimension encoding which requires creating a new linear layer of the dimension N much lower than the image dimension so that I can extract the activations.

If anybody can help me with fitting a linear layer in the decoder part I would appreciate (i know how to Flatten() the data, but I guess I need to “unflatten” it again to interface with the Conv2d layer again)

Thank you in advance.

After doing some research on pytorch I have gotten the following

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.Conv2d(64, 8, kernel_size=3, stride=1),
    nn.MaxPool2d(7, stride=1)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1),
    nn.Conv2d(64, 32, kernel_size=4, stride=2),
    nn.Conv2d(32, input_shape[0], kernel_size=8, stride=4),

This gives me a 8-dimensional bottleneck at the output of the encoder which works fine torch.Size([1, 8, 1, 1]) . Just the way I wanted to in the “bottleneck” layer.

What I cannot do is train the autoencoder with:

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

The decoder gives me an error:

Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

The output of your first nn.ConvTranspose2d layer will be [batch_size, 64, 3, 3], which is too small for the next one, which uses a kernel size of 4.
You could use a kernel size of 4 for the first nn.ConvTranpose2d layer.
This, however, will yield an error in the last one, since you are using nn.Conv2d layers afterwards in your decoder. Is that a typo or do you really want to lower the spatial size again?

Thank you for the fast reply.

I have managed to get my autoencoder “right” :slight_smile: , I have a bottleneck layer of 2x2x2 which gives me 8 clusters like this:

        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=8, stride=4),
            nn.Conv2d(32, 64, kernel_size=4, stride=2),
            nn.Conv2d(64, 2, kernel_size=3, stride=1),
            nn.MaxPool2d(6, stride=1)

        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(2, 64, kernel_size=3, stride=1),
            nn.ConvTranspose2d(64, 32, kernel_size=8, stride=4),
            nn.ConvTranspose2d(32, 1, kernel_size=8, stride=4),

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

and my training is simple, the loss is how much the prediction differs from the original image:

    distance = nn.MSELoss()

    loss = distance(states,output)

but now the problem is that the network is not learning (the loss is constant at 10k which is bad)

even with a high learning rate:

optimizer_encoder = optim.SGD(model2.parameters(), lr=0.005)

I have no idea why :frowning:

I have removed


on the encoder end and now the loss is in rather normal range and decreasing.

If anyone has the same problem or is creating an unsupervised autoencoder :slight_smile: