Pytorch 1.12.1 RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Cuong_Quoc · November 3, 2022, 1:07pm

Hi guys . I met the problem with loss.backward() as you can see here
File “train.py”, line 360, in train
loss_adv.backward(retain_graph=True)
File “/usr/local/lib/python3.7/dist-packages/torch/_tensor.py”, line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py”, line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 7]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

My code is

I use pytorch 1.12.1 in google colab
Can anyone help me to solve this problem .Thank you very much

srishti-git1110 · November 3, 2022, 1:35pm

Hi, could you please specify why you are using retain_graph=True?

This argument becomes the source of error sometimes. If it isn’t required, try removing it and see if it helps.

Otherwise, please post a minimum executable snippet enclosed within ```.

Cuong_Quoc · November 3, 2022, 2:09pm

When I don’t use retain_graph=True I meet this problem
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Cuong_Quoc · November 3, 2022, 2:21pm




def train(step):
    global dataiter_srcs

    ## Initialize iteration
    model.train()
    
    scheduler.step()
    if args.sagnet:
        scheduler_style.step()
        scheduler_adv.step()

    ## Load data
    tic = time.time()
    
    n_srcs = len(args.sources)
    if step == 0:
        dataiter_srcs = [None] * n_srcs
    data = [None] * n_srcs
    label = [None] * n_srcs
    for i in range(n_srcs):
        if step % len(loader_srcs[i]) == 0:
            dataiter_srcs[i] = iter(loader_srcs[i])
        data[i], label[i] = next(dataiter_srcs[i])

    data = torch.cat(data)
    label = torch.cat(label)
    rand_idx = torch.randperm(len(data))
    data = data[rand_idx]
    label = label[rand_idx].cuda()
    
    time_data = time.time() - tic

    ## Process batch
    tic = time.time()

    # forward
    y, y_style = model(data)
        
    if args.sagnet:
        # learn style
        loss_style = criterion(y_style, label)
        optimizer_style.zero_grad()
        loss_style.backward(retain_graph=True)
        optimizer_style.step()
    
        # learn style_adv
        loss_adv = args.w_adv * criterion_adv(y_style)
        optimizer_adv.zero_grad()
        loss_adv.backward(retain_graph=True)
        if args.clip_adv is not None:
            torch.nn.utils.clip_grad_norm_(model.module.adv_params(), args.clip_adv)
        optimizer_adv.step()
    
    # learn content
    loss = criterion(y, label)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    time_net = time.time() - tic

    ## Update status
    status['iteration'] = step + 1
    status['lr'] = optimizer.param_groups[0]['lr']
    status['src']['t_data'].update(time_data)
    status['src']['t_net'].update(time_net)
    status['src']['l_c'].update(loss.item())
    if args.sagnet:
        status['src']['l_s'].update(loss_style.item())
        status['src']['l_adv'].update(loss_adv.item())
    status['src']['acc'].update(compute_accuracy(y, label))

    ## Log result
    if step % args.log_interval == 0:
        print('[{}/{} ({:.0f}%)] lr {:.5f}, {}'.format(
            step, args.iterations, 100. * step / args.iterations, status['lr'],
            ', '.join(['{} {}'.format(k, v) for k, v in status['src'].items()])))

Cuong_Quoc · November 3, 2022, 2:23pm

This is my train function . Can you help me to find the error .Thank you very much !

ptrblck · November 3, 2022, 9:33pm

Double post from here: [Solved][Pytorch1.5] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation - #39 by ptrblck