Gpu memory doubled with same expression?

hi,
the task is a classification problem with resnet. my goal is to only back propagate the least loss in a batch. however there is a huge difference in two similiar expressions, which i dont know why.

the first one is

Loss_function = nn.CrossEntropyLoss(reduce=False)
for epoch in range(0,epoch_num):


    for iter_num,data in enumerate(train_loader):
        image,label=data
        image=Variable(image).cuda(cuda_id)
        label=Variable(label).cuda(cuda_id)
        out=resnet50(image)
        loss=Loss_function(out,label)
        loss_min=torch.min(loss)


        optimizer.zero_grad()
        loss_min.backward()
        optimizer.step()

the 2nd is

for epoch in range(0,epoch_num):


    for iter_num,data in enumerate(train_loader):
        image,label=data
        image=Variable(image).cuda(cuda_id)
        label=Variable(label).cuda(cuda_id)
        out=resnet50(image)
        loss=Loss_function(out,label)
        loss_min,_=torch.min(loss,0)


        optimizer.zero_grad()
        loss_min.backward()
        optimizer.step()

the 1st case taks twice memory as much as the 2nd case.