Calculating loss consumes a lot of RAM(cpu)

I wrote a custom loss and used GPU to train my model, but when calculating the loss, the CPU RAM usage continued to rise. I would like to know what caused this problem. (Is it related to computational graphs?)

The following is the code to reproduce the problem. Similar operations are performed in the loss function I designed.

import torch
import sys
import tqdm

b = torch.zeros(5).cuda()
d1 = 3*torch.zeros(1,requires_grad=True).cuda()
d2 = 3*torch.zeros(1,requires_grad=True).cuda()
d3 = 3*torch.zeros(1,requires_grad=True).cuda()
d4 = 3*torch.zeros(1,requires_grad=True).cuda()
d5 = 3*torch.zeros(1,requires_grad=True).cuda()
d6 = 3*torch.zeros(1,requires_grad=True).cuda()
d7 = 3*torch.zeros(1,requires_grad=True).cuda()
d8 = 3*torch.zeros(1,requires_grad=True).cuda()

for i in tqdm(range(1000000)):
    b +=(d1*d2*d3*d4*d5*d6*d7*d8)

Yes, the computation graph will be created and attached to b in each iteration thus storing all needed metadata as well. How large is the increase in each iteration?

Sorry for the late reply.
After 1,000,000 iterations, RAM usage increased by 12.2GB.

Thanks for checking! I would assume this increase might be expected as it comes down to 12.2 * 1024**2 / 1000000 = 12.7926272 kB per iteration, which might fit the metadata. CC @albanD to correct me in case ~12kB are unexpected.

Thank you for your reply. So if my custom loss function requires a similar calculation process (finally requiring backpropagation), is there a better way to write the program that can reduce RAM usage?