Storing intermediates outputs in python list + clone() and memory usage problem

Hi,
I have a CNN block composed of a graph of conv layers, when doing a forward pass of this block, the intermediates layers outputs are stored in a list of lists of tensors (representing the graph matrix), however when testing I get an error that I believe is related to in-place operations and gradient calculation which I resolved by storing the clone() of the intermediates outputs but the problem is that now each block uses 2 times more memory when the forward method is called.
Is there another way to solve this? any suggestions on how to modify the code would be welcome

class Block(nn.Module):
  def __init__(self):
          super(Block, self).__init__()
          ...
  def forward(self, X):
    intermediates = [[] for _ in range(self.block["num_nodes"])]
    
    X_clone = X.clone()
    intermediates[0].append(X_clone)
               
    for i in range(self.block["num_nodes"]):
                    
                    Z = intermediates[i]
                    for j in range(self.block["num_nodes"]):
                        if self.layers[i][j] is not None:
                            branches = self.layers[i][j]
                            for branch in branches:
                                    Z_clone = branch(Z).clone()
                                    intermediates[j].append(Z_clone)
    
                output1 = torch.cat(intermediates[self.block["num_nodes"] - 1], dim=1)
                output2 = self.pool_op(output1)
    return output2