Autograd in Variational Autoencoder and Classification

`Hi guys, sorry but I have a question about how the gradient is propagated in the following architecture.

I have a VAE where both the encoder and the decoder are GRUs. In particular the encoder takes as input the embedding of a sentence (x), while the decoder takes as input the sentences (x) and the hot one encoding of some attributes (y) of the sentence (e.g, the verb tense).

I want to connect a classifier (e.g, fully connected) which takes as input the hidden state of the decoder and outputs some y_hat probabilities, to actually check if the reconstruction contains information about the original attributes of x.
Now when my classifier is not predicting correctly I would like that the encoder/decoder vae tries to learn hidden representations which contains information about that attribute.

So my question is when in my training loop I suppose I have something like the following code, does the gradient go back until the encoder or not? If someone can explain me the path of the gradient I would really appreciated it. Thanks again

vae = SentenceVAE()
classifier = Classifier()
model_opt = torch.optim.Adam(vae.parameters(), lr=0.001) 

for epoch in range(epochs):
        for iteration, batch in enumerate(data_loader):
            sent_recon, hidden_decoder  = vae(batch['input'], batch['length'], batch['label'])
            attributes_probability = classifier(hidden_decoder)

#computing the classifier loss using cross_entropy and the vae_loss

            loss = vae_loss - (disc_weight*classifier_loss)

            if split == 'train':