Does nn.Embedding detach tensor from computation graph?


I am currently trying to train a GAN model in the context of machine translation.

My problem is similar to: GitHub - ngohoanhkhoa/GAN-NMT: Generative Adversarial Networks in Neural Machine Translation ; although they use TensorFlow and the data is different.

My problem is that during training the gradients of my generator are none, which basically means that the models do not train at all. The generator is based on a seq2seq (LSTM) model. The discriminator is a CNN. Both models rely on nn.embedding as a first step to encode the sequences I input into the models.

I tried to fix the problem for some time, without success. My assumption is that when I forward the generator output to the discriminator, which derives embeddings using nn.Embedding, some detachment takes place. Can this be the case?

Your help is highly appreciated. Thanks.



No detach takes place in that layer no.
But if you most likely give it Tensors that don’t require gradients as input no?

Thanks for your reply. The input to the Generator has .requires_grad set to false.

I calculate the loss for the Discriminator output. .backward() creates the gradients for this model, but not for the generator which outputs the input for the Discriminator.

Let me give you some more details about the data processing:

  1. The sequences I input into the model are encoded.
  2. Based on these encodings the embeddings are created inside the generator
  3. The outputted data is ‘decoded’ again using the former character-encoding mapping; this is done as I give both the output of the generator and some encoded target sequences into the Discriminator; these target sequences do not pass through the Generator which means they are not embedded.
  4. In the Discriminator sequences are again embedded

I noticed that the output of the Generator has requires_grad set to true. When I transfer the data to the original encoding requires_grad is set to false. But I am not sure if this is the problem.

Well, the thing is that Embedding layers take indices as input (of integer types). And such Tensors cannot track gradients (because they are not contiguous). So I still think that you convert to an integer type between the two layers and thus don’t get gradients for your generator.

Ah, I get it now. I’ll check that. Thanks.