How does .expand() affect gradient?

I am working with an autoencoder and I use latent.view(batch_size,1,-1).expand(-1,<max_length>,-1) to add a timestep dimension and then repeat the latent space that many time steps to then feed into the decoder RNN.

When I use .expand() in this way, how does the gradient backpropagate through it? I need to ensure that the gradient accumulates from each time step and is stored in the latent vector so that I can then also backprop through the encoder.

Confirmation that what I am doing gets this done would be much appreciated.

Sorry for chiming in without providing an answer.

I’m also trying to get a hang on autoencoder, be it based on CNNs or RNNs. I was just wondering, why do need to expand() for an RNN autoencoder. Assuming that you use the same parameters for the encoder and decoder, the hidden state of the encoder can be directly used for the decoder. Sure, you can first push it through a bottleneck to reduce the dimension but I cannot see the need for expand(). The shape of the hidden state is (num_layers*num_directions, batch_size, embed_dim). Why is there a need for a <max_length>.

In my current RNN-based autoencoder I don’t have the need for that. But it’s not unlikely that I’m doing it wrong :).

I am using expand() to duplicate the latent space vector so that the decoder RNN gets the latent space as an input at every time step.

It seems to me like giving the RNN the latent space and previous timestep’s output at every step provides more info for it to make better predictions. However, I also haven’t done head on comparisons.

I have to admit that I don’t quite understand what you’re doing. Any chance you could share your code?

In my approach I essentially start with the [PyTorch Seq2Seq Tutorial] (just without attention)(https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html), where input and target are the same sequence instead of different languages. To extend it to an autoencoder, I’ve made the following extensions

  • Encoder:
    • Flatten hidden state from (num_layers*num_directions, batch_size, hidden_dim) to (batch_size, num_directions*num_layers*hidden_dim)
    • Push flattened hidden state to one or more linear layers to reduce dimension
  • Decoder
    • Push flattened hidden state to one or more linear layers to increase dimension
    • Unflatten hidden state from (batch_size, num_directions*num_layers*hidden_dim) to (num_layers*num_directions, batch_size, hidden_dim)

The code can be viewed in my Github repository.

I really don’t know if this the right way to go. If I only flatten and unflatten the hidden state (i.e., without for linear layers) it essentially becomes the Seq2Seq tutorial. At least I can make sure that I do the flatten/unflatten correctly.

The backward of expand/broadcast is sum, this would seem to do the accumulation you want.

Best regards

Thomas