GRU included CNN for text Generator

Hi everybody,
I try to integrate CNN to GRU. My model gets the image through CNN. The features from CNN will pass to the GRU frame by frame. The structure is shown in the picture.

This is my example code that implemented follows the above structure.
encoder :

### input Image size [batch,seq,colorch,hight,weight]
### expext output 
class CNNencoder(nn.Module):
    def __init__(self, input_size, hidden_size,batch_size =5):
        super(CNNencoder, self).__init__()
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.hidden_size = hidden_size
        self.modelVGG = models.vgg11(pretrained = False)
        self.modelVGG =

        self.adaptor = nn.Linear(8192, self.hidden_size)
        self.adaptor =
        self.gru = nn.GRU(hidden_size, hidden_size, batch_first=True, bidirectional=False)  ## (inputSize, hidden_size)
        self.gru =
        self.batch_size = batch_size

    def forward(self, input, hidden):
        seqs = input.size()[1]
        for indexseq in range(0, seqs):
            inputImageBatch = input[:, indexseq,:,:,:].view(-1,3,128,128)
            features = self.modelVGG.features(inputImageBatch)
            flat_features = features.view(features.size(0), 1,-1)  # flatten
            if indexseq == 0:
                output = flat_features
                output =, flat_features), dim=1)

        # output = flat_features    ## expected   [batch,seq,features]
        outputAdaptor = self.adaptor(output)
        outputGru, hidden = self.gru(outputAdaptor, hidden)
        return outputGru, hidden

    def initHidden(self):
        return torch.zeros(1, self.batch_size, self.hidden_size, device=self.device)

If I calculate loss and gradient from loss.backword(). I would like to khow, How CNN parameter will get gradient through time?
Could anyone suggest me to implement follow above structure?