Autograd in Variational Autoencoder and Classification

Giuseppe_Russp · February 4, 2020, 8:56am

`Hi guys, sorry but I have a question about how the gradient is propagated in the following architecture.

I have a VAE where both the encoder and the decoder are GRUs. In particular the encoder takes as input the embedding of a sentence (x), while the decoder takes as input the sentences (x) and the hot one encoding of some attributes (y) of the sentence (e.g, the verb tense).

I want to connect a classifier (e.g, fully connected) which takes as input the hidden state of the decoder and outputs some y_hat probabilities, to actually check if the reconstruction contains information about the original attributes of x.
Now when my classifier is not predicting correctly I would like that the encoder/decoder vae tries to learn hidden representations which contains information about that attribute.

So my question is when in my training loop I suppose I have something like the following code, does the gradient go back until the encoder or not? If someone can explain me the path of the gradient I would really appreciated it. Thanks again

vae = SentenceVAE()
classifier = Classifier()
model_opt = torch.optim.Adam(vae.parameters(), lr=0.001) 


for epoch in range(epochs):
        for iteration, batch in enumerate(data_loader):
            sent_recon, hidden_decoder  = vae(batch['input'], batch['length'], batch['label'])
            attributes_probability = classifier(hidden_decoder)

#computing the classifier loss using cross_entropy and the vae_loss

            loss = vae_loss - (disc_weight*classifier_loss)

            if split == 'train':
                model_opt.zero_grad()
                oss.backward()
                model_opt.step()