AE and Unsupervised Clustering

Andrea_Grioni · November 27, 2020, 9:42pm

Hi all!

I am working on a dataset of ~300 samples with ~5000 data-points each - ranged between 0 and 1. I am interested in:

Group samples for similarity;
Find the differences between groups;

Would make sense to train an autoencoder to reduce the dimensionality to N points. Take the output of the encoder and use it as the input of an unsupervised algorithm (KNN, DBSCAN)?

if so, is it correct to use a sigmoid and relu activation for the encoder and decoder, respectively?

The AE architecture:

class AE(nn.Module):
    def __init__(self):
        super().__init__()

        self.encoder_hidden_layer_1 = nn.Linear(in_features=4979 , out_features=3000)
        self.encoder_hidden_layer_2 = nn.Linear(in_features=3000, out_features=1500)
        self.encoder_output_layer = nn.Linear(in_features=1500, out_features=10)
        
        self.decoder_hidden_layer_1 = nn.Linear(in_features=10, out_features=512)
        self.decoder_hidden_layer_2 = nn.Linear(in_features=512, out_features=2000)
        self.decoder_output_layer = nn.Linear(in_features=2000, out_features=4979)

    def forward(self, features):
        x = self.encoder_hidden_layer_1(features)
        x = F.relu(x)
        x = self.encoder_hidden_layer_2(x)
        x = F.relu(x)
        x = self.encoder_output_layer(x)
        encoded = F.sigmoid(x)
        
        x = self.decoder_hidden_layer_1(x)
        x = F.relu(x)
        x = self.decoder_hidden_layer_2(x)
        x = F.relu(x)
        x = self.decoder_output_layer(x)
        decoded = F.relu(x)
        
        return decoded, encoded