I’ve created a variational autoencoder to encode 1-dimensional arrays. The encoding is done through 3 1d-convolutional layers. Then, after the sampling trick, I reconstruct the series using 3 fully connected layers. Here are my questions in case some can shed some light on it:

I think it would be better if I use 1d-deconvolutional layers instead of fully connected, but I cannot understand precisely why.

It’s because it would bring better results? But then, if the FC layer is complex enough should be able to archive the same results, right?

It’s because would be more efficient? This is, would get the same results as a complex enough FC layer but with less training and parameters?

Or it’s because of other reasons that I’m missing.

Hello, I’m no expert about 1d-convolutions but let’s give it a go.

For dealing with image data there has been a shift away from using fully connected layers and replace them with conv layers. One reason for this is that the conv has spatial awareness the fully connected layers lack.

The same goes for one dimension. If you use 1d-deconv you keep the spatial awareness throughout the network. If you use fully connected layers these connections get influenced from everywhere and I’m guessing it could confuse the network. In essence, I believe it’s easier to fuck up -> harder to train.

Would love to hear back if you find some other explanation

Yes, sure. Let’s see what people say. I’m quite sure it is about efficiency, because according to the theory deep learning should be able to approximate any function. Despite this, I do not know if deconvolutions can archive that faster that fc, or prone less to get stack in a sub-optimal minima. I guess they are a world by itself.

A 1-d convolution can be seen as a fully connected per pixel of a channel. In other words, is a linear combination of the pixel in position (x,y) of each channel, while the FC is a linear combination of all the pixel of all the channels. So essentially they are not the same operation and, thus can have different effects.

In my opinion if your data is 1 dimensional you should be using FC, unless you know that there is a special relation between different dimensions on that vector, then you can reshape them into 2D and perform 1d convolution operations.

However, the best thing you can do is try both ways and see what goes better.

Thanks, for answering. Actually the vector is a time series that should be contain time patterns, therefore the best approach for encoding must be 1d convolutions. For the decoder, actually, I tried both approaches and I get similar reconstruction losses. I guess for my data there is no real difference.