Hi all!
I am working on a dataset of ~300 samples with ~5000 data-points each - ranged between 0 and 1. I am interested in:
- Group samples for similarity;
- Find the differences between groups;
Would make sense to train an autoencoder to reduce the dimensionality to N points. Take the output of the encoder and use it as the input of an unsupervised algorithm (KNN, DBSCAN)?
if so, is it correct to use a sigmoid and relu activation for the encoder and decoder, respectively?
The AE architecture:
class AE(nn.Module):
def __init__(self):
super().__init__()
self.encoder_hidden_layer_1 = nn.Linear(in_features=4979 , out_features=3000)
self.encoder_hidden_layer_2 = nn.Linear(in_features=3000, out_features=1500)
self.encoder_output_layer = nn.Linear(in_features=1500, out_features=10)
self.decoder_hidden_layer_1 = nn.Linear(in_features=10, out_features=512)
self.decoder_hidden_layer_2 = nn.Linear(in_features=512, out_features=2000)
self.decoder_output_layer = nn.Linear(in_features=2000, out_features=4979)
def forward(self, features):
x = self.encoder_hidden_layer_1(features)
x = F.relu(x)
x = self.encoder_hidden_layer_2(x)
x = F.relu(x)
x = self.encoder_output_layer(x)
encoded = F.sigmoid(x)
x = self.decoder_hidden_layer_1(x)
x = F.relu(x)
x = self.decoder_hidden_layer_2(x)
x = F.relu(x)
x = self.decoder_output_layer(x)
decoded = F.relu(x)
return decoded, encoded