output = model(input)
loss = lossfunc1(output[0], target1)
loss += lossfunc2(output[1], target1)
loss += lossfunc2(output[2], target1)
loss.backward()
and second where I had made a mistake
output = model(input)
loss = lossfunc1(output[0], target1)
loss = 0
loss += lossfunc2(output[1], target1)
loss += lossfunc2(output[2], target1)
loss.backward()
The first consumes memory of 7.8 Gigs while second one consumes 9.5 Gigs! When I realized I made mistake (second one), I though GPU consumption would increase because probably it’ll backprop on lossfunc1 and output[0] variable.
But as you can see it’s completely opposite!, can someone explain?