Encount gradient buffer error. Multi output, perform the same function (like cascade CNN)

Hi, everyone. It’s my first time to use Pytorch for my project. When I start to train my network, an error occurred:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I wander that my customized loss module caused this error.
My network has 4 stacks and each stack tries to predict the same target of different scales, just like:

pred: list of tensors [(nstack, bacth, C, H1, W1), (nstack, bacth, C, H2, W2), (nstack, bacth, C, H2, W2)]
target: tensor (stack, batch, C, H, W)

I rescale the target tensor and calculate each scale and each stack losses.
Can I just calculate the loss of predicted tensors with the same target tensors, or need I repeat the target tensor for nstack times and then calculate the loss?

Here is my forward method in my network module:

    def forward(self, imgs):
        # Input Tensor: a batch of images within [0,1], shape=(N, H, W, C). Pre-processing was done in data generator
        x = imgs.permute(0, 3, 1, 2)  # Permute the dimensions of images to (N, C, H, W)
        x = self.pre(x)
        pred = []
        # loop over stack

        for i in range(self.nstack):
            preds_instack = []
            # return 5 scales of feature maps
            hourglass_feature = self.hourglass[i](x)

            features_instack = self.features[i](hourglass_feature)

            for j in range(5):  # handle 5 scales of heatmaps
                if i != self.nstack - 1:
                    if j == 0:
                        x = x + self.merge_preds[i][j](preds_instack[j]) + self.merge_features[i][j](

        # returned list shape: [nstack * [128*128, 64*64, 32*32, 16*16, 8*8]]
        return pred