Difference between the 2 computational graphs memory consumption

output = model(input)

loss = lossfunc1(output[0], target1)
loss += lossfunc2(output[1], target1)
loss += lossfunc2(output[2], target1)

loss.backward()

and second where I had made a mistake

output = model(input)

loss = lossfunc1(output[0], target1)
loss = 0
loss += lossfunc2(output[1], target1)
loss += lossfunc2(output[2], target1)

loss.backward()

The first consumes memory of 7.8 Gigs while second one consumes 9.5 Gigs! When I realized I made mistake (second one), I though GPU consumption would increase because probably it’ll backprop on lossfunc1 and output[0] variable.
But as you can see it’s completely opposite!, can someone explain?