BPTT with linear layers

I’m using a recurrent VAE to encode a path and then decode that path into a new similar path. The encoding portion consists of a GRU and 3 linear layers, and the decoding portion consists of a GRU and 2 linear layers.

I’m not sure how to train this network given that there are RNNs as well as linear layers. I imagine I’ll have to use BPTT because of the RNN, but I’m not sure if I can use BPTT on the entire network or if I have to single out the recurrent layers, do BPTT on them, then perform normal backprop on the other layers.