Extracting reduced dimension data from autoencoder in pytorch

Hi!

I have defined my autoencoder in pytorch as following:

        self.encoder = nn.Sequential(
            nn.Conv2d(input_shape[0], 32, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=1, stride=1),
            nn.ReLU()
        )

        self.decoder = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(64, 32, kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Conv2d(32, input_shape[0], kernel_size=1, stride=1),
            nn.ReLU(),
            nn.Sigmoid()
        )

Everything works fine, I can train it.

But, what I need is to get a reduced dimension encoding which requires creating a new linear layer of the dimension N much lower than the image dimension so that I can extract the activations.

If anybody can help me with fitting a linear layer in the decoder part I would appreciate (i know how to Flatten() the data, but I guess I need to “unflatten” it again to interface with the Conv2d layer again)

Thank you in advance.

After doing some research on pytorch I have gotten the following

self.encoder = nn.Sequential(
    nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Conv2d(32, 64, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(64, 8, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(7, stride=1)
)

self.decoder = nn.Sequential(
    nn.ConvTranspose2d(8, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.Conv2d(64, 32, kernel_size=4, stride=2),
    nn.ReLU(),
    nn.Conv2d(32, input_shape[0], kernel_size=8, stride=4),
    nn.ReLU(),
    nn.Sigmoid()
)

This gives me a 8-dimensional bottleneck at the output of the encoder which works fine torch.Size([1, 8, 1, 1]) . Just the way I wanted to in the “bottleneck” layer.

What I cannot do is train the autoencoder with:

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

The decoder gives me an error:

Calculated padded input size per channel: (3 x 3). Kernel size: (4 x 4). Kernel size can't be greater than actual input size

The output of your first nn.ConvTranspose2d layer will be [batch_size, 64, 3, 3], which is too small for the next one, which uses a kernel size of 4.
You could use a kernel size of 4 for the first nn.ConvTranpose2d layer.
This, however, will yield an error in the last one, since you are using nn.Conv2d layers afterwards in your decoder. Is that a typo or do you really want to lower the spatial size again?

Thank you for the fast reply.

I have managed to get my autoencoder “right” :slight_smile: , I have a bottleneck layer of 2x2x2 which gives me 8 clusters like this:

        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=4, stride=2),
            nn.ReLU(),
            nn.Conv2d(64, 2, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(6, stride=1)
        )

        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(2, 64, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.ConvTranspose2d(64, 32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 1, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

and my training is simple, the loss is how much the prediction differs from the original image:


    distance = nn.MSELoss()

    loss = distance(states,output)

but now the problem is that the network is not learning (the loss is constant at 10k which is bad)

even with a high learning rate:

optimizer_encoder = optim.SGD(model2.parameters(), lr=0.005)

I have no idea why :frowning:

I have removed

       nn.ReLU(),
       nn.Sigmoid()

on the encoder end and now the loss is in rather normal range and decreasing.

If anyone has the same problem or is creating an unsupervised autoencoder :slight_smile: