Hi all,

For the purpose of autoencoders, It seems to be common practice to build the decoder architecture in a way so that it mirrors the encoder architecture. However, there seem to be 2 approaches of doing this. Following example aims to create embedding vectors of length 1024 for time series inputs, using CNNs and an LSTM pooling layer (note we use the last hidden layer as the output). Which one of these is the correct “mirror” for the encoder?

(For convenience, I’m also counting the number of total and trainable parameters)

#Encoder

a = torch.nn.Conv1d(2, 128, 3, stride=1)

b = torch.nn.LSTM(input_size=128,hidden_size=512,num_layers=1)

c = torch.nn.Linear(512,1024)

model_1 = torch.nn.Sequential(a,b,c)

num_all_1 = sum(p.numel() for p in model_1.parameters())

num_train_1 = sum(p.numel() for p in model_1.parameters() if p.requires_grad)

print("%s and %s" % (num_all_1, num_train_1))

#Decoder version 1

a = torch.nn.LSTM(input_size=1024,hidden_size=512,num_layers=1)

b = torch.nn.ConvTranspose1d(512, 128, 3, stride=1)

c = torch.nn.Linear(128,2)

model_2 = torch.nn.Sequential(a,b,c)

num_all_2 = sum(p.numel() for p in model_2.parameters())

num_train_2 = sum(p.numel() for p in model_2.parameters() if p.requires_grad)

print("%s and %s" % (num_all_2, num_train_2))

#Decoder version 2

a = torch.nn.Linear(1024,512)

b = torch.nn.LSTM(input_size=512,hidden_size=128,num_layers=1)

c = torch.nn.ConvTranspose1d(128, 2, 3, stride=1)

model_3 = torch.nn.Sequential(a,b,c)

num_all_3 = sum(p.numel() for p in model_3.parameters())

num_train_3 = sum(p.numel() for p in model_3.parameters() if p.requires_grad)

print("%s and %s" % (num_all_3, num_train_3))