Hi everybody,
I’m new to pytorch and trying to implement a multimodal deep autoencoder(means: autoencoder with multiple inputs)
At the first all inputs encode with same encoder architecture, after that, all outputs concatenates together and the output goes into the another encoding and deoding layers:

At the end, last decoder layer must reconstruct the inputs as multiple outputs.

Now I have between one and 9 inputs depending on the user’s choice and each input is a 1215x1519 matrix.

I’m rally stuck in first and last layers of this autoencoder.

I have another problem:
Each input is a computed embedding of a graph and as I said before, each input is 1215x1519 matrix.
On the other hand, since we do not have any labels for our data, the original 9x1215x1519 (9 is number of inputs) data considered as label and then considered a noisy version of original data with same shape for model input, in this way we’re trying to reconstruct input according to the labels.

In another implementation of this case with tensorflow and keras, developer fit the model with keras fit() function:

I don’t know what exactly the Keras model is doing, as the fit method doesn’t show any information about the loss function etc.
You could thus check its internal implementation and use the same approach in PyTorch. E.g. if it’s some form of mse loss, use nn.MSELoss in PyTorch to calculate the loss.
I also don’t know what the shape represents in Keras (Is dim0 the batch size? If so, the input shape looks wrong, but I’m also not deeply familiar with Keras) so you should check how each dimension is used inside the model.