Getting the pytorch backward RuntimeError when creating slices of a tensor

I have come across the infamous RuntimeError: Trying to backward through the graph a second time... error while executing the code given below. I have read all the existing posts on the issue and could not resolve it myself. Passing retain_graph=True in backward() fixes the issue in the provided snippet, however, the snippet is only an oversimplified version of a large network where retain_graph=True changes the error to the following:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3000, 512]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I tried setting torch.autograd.set_detect_anomaly(True) and determining the point of failure, but all that I tried failed and the error persisted.

I suspect that if the error in the current snippet is fixed without retain_graph=True then the same solution should work for the larger network.

Therefore, I want to understand why is it that backward() works fine for first two tensors in batched_feats, while fails for the third one. I would really appreciate if someone can help me see the reuse of an intermediate result that has been freed.

# Code Snippet
import numpy as np
import torch

def batch_matrix(vector_pairs, factor=2):
    baselen = len(vector_pairs[0]) // factor
    split_batch = []

    for j in range(factor):
        for i in range(factor):
            start_j = j * baselen
            end_j = (j+1) * baselen if j != factor - 1 else None
            start_i = i * baselen
            end_i = (i+1) * baselen if i != factor - 1 else None

            mini_pairs = vector_pairs[start_j:end_j, start_i:end_i, :]
            split_batch.append(mini_pairs)
    return split_batch

def concat_matrix(vectors_):
    vectors = vectors_.clone()
    seq_len, dim_vec = vectors.shape
    project_x = vectors.repeat((1, 1, seq_len)).reshape(seq_len, seq_len, dim_vec)
    project_y = project_x.permute(1, 0, 2)
    matrix = torch.cat((project_x, project_y), dim=-1)
    matrix_ = matrix.clone()

    return matrix_

if __name__ == "__main__":
    vector_list = []
    for i in range(10):
        vector_list.append(torch.randn((5,), requires_grad=True))
    vectors = torch.stack(vector_list, dim=0)
    pmatrix = concat_matrix(vectors)

    factor = np.ceil(vectors.shape[0]/6).astype(int)
    batched_feats = batch_matrix(pmatrix, factor=factor)

    for i in batched_feats:
        i = i + 5
        print(i.shape)
        summed = torch.sum(i)
        summed.backward()

The code produces the following output and error trace:

torch.Size([5, 5, 10])
torch.Size([5, 5, 10])
Traceback (most recent call last):
  File "/home/user/PycharmProjects/project/run.py", line 43, in <module>
    summed.backward()
  File "/home/user/anaconda3/envs/diff/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/user/anaconda3/envs/diff/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.