Runtime Error in backpropagation of the graph second time

Hi,

I am implementing a GNN, which uses Chebyshev’s polynomials. I have a for loop that uses the recursive relation between Chebyshev’s polynomials:

# for k = 0, T_0(L) = I
Tx = T0_x = graphs.clone()
    
if self.cheby_K > 1: # T_1(L) = L
    T1_x = laplacians @ graphs
    Tx = torch.cat((T0_x, T1_x),dim=-1)
    # T_k(x) = 2*L*T_{k-1}(x) - T_{k-2}(x)
for k in range(2, self.cheby_K):
   Tk_x = 2 * laplacians @ self.T1_x - T0_x
   Tx = torch.cat((Tx,Tk_x),dim=-1)
   T1_x, T0_x = Tk_x, T1_x

When I test my network with the following snippet:

with torch.autograd.set_detect_anomaly(True):
        with torch.enable_grad():
            
            graphs = torch.unsqueeze(nodes,0).contiguous().to(device)
            
            laplacians = torch.unsqueeze(laplacian,0).to(device)
                
            for i in range(0,100):
                
                optimizer.zero_grad()
                
                logits, out_nodes, out_laplacians = gnn(graphs,laplacians)
                
                loss = criterion(logits,labels)
                
                print('epoch:{1:d}, loss: {0:.4f}'.format(loss,i))
                
                probs = torch.softmax(logits,dim=-1)
                
                print('predictions:\n',probs.type(torch.float16).data.numpy())
                
                loss.backward()
                
                optimizer.step()

I receive the following error message

Blockquote “RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.”

According to the error, I must specify the retain_graph=True when calling .backward in my loop above, but the training code again raises another RunTimeError:

Blockquote RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 32]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I dont use any hidden state. Also, if I set self.cheby_K=1, the training runs without an error.
Could you help me to figure out what I am doing wrong?

Thanks

Edit:Typo

Is the line concerned by the error not pointed out in the whole error message? If so, please let me see it.

The following is the whole error message when I dont use the retain_graph=True in .backward():

Blockquote C:\Users\I3ase\miniconda3\lib\site-packages\torch\autograd_init_.py:145: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
File “C:\Users\I3ase\miniconda3\lib\runpy.py”, line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File “C:\Users\I3ase\miniconda3\lib\runpy.py”, line 87, in run_code
exec(code, run_globals)
File "C:\Users\I3ase\miniconda3\lib\site-packages\spyder_kernels\console_main
.py", line 23, in
start.main()
File “C:\Users\I3ase\miniconda3\lib\site-packages\spyder_kernels\console\start.py”, line 296, in main
kernel.start()
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\kernelapp.py”, line 612, in start
self.io_loop.start()
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\platform\asyncio.py”, line 199, in start
self.asyncio_loop.run_forever()
File “C:\Users\I3ase\miniconda3\lib\asyncio\base_events.py”, line 596, in run_forever
self._run_once()
File “C:\Users\I3ase\miniconda3\lib\asyncio\base_events.py”, line 1890, in _run_once
handle._run()
File “C:\Users\I3ase\miniconda3\lib\asyncio\events.py”, line 80, in _run
self._context.run(self._callback, *self._args)
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\ioloop.py”, line 688, in
lambda f: self._run_callback(functools.partial(callback, future))
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\ioloop.py”, line 741, in _run_callback
ret = callback()
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\gen.py”, line 814, in inner
self.ctx_run(self.run)
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\gen.py”, line 775, in run
yielded = self.gen.send(value)
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\kernelbase.py”, line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\gen.py”, line 234, in wrapper
yielded = ctx_run(next, result)
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\kernelbase.py”, line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\gen.py”, line 234, in wrapper
yielded = ctx_run(next, result)
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\kernelbase.py”, line 543, in execute_request
self.do_execute(
File “C:\Users\I3ase\miniconda3\lib\site-packages\tornado\gen.py”, line 234, in wrapper
yielded = ctx_run(next, result)
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\ipkernel.py”, line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File “C:\Users\I3ase\miniconda3\lib\site-packages\ipykernel\zmqshell.py”, line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 2894, in run_cell
result = self._run_cell(
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 2940, in _run_cell
return runner(coro)
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\async_helpers.py”, line 68, in pseudo_sync_runner
coro.send(None)
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3165, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3357, in run_ast_nodes
if (await self.run_code(code, result, async
=asy)):
File “C:\Users\I3ase\miniconda3\lib\site-packages\IPython\core\interactiveshell.py”, line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
runfile(‘C:/Users/I3ase/Documents/projects/agcn/unit_test_sgc_ll.py’, wdir=‘C:/Users/I3ase/Documents/projects/agcn’)
File “C:\Users\I3ase\miniconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 565, in runfile
exec_code(file_code, filename, ns_globals, ns_locals,
File “C:\Users\I3ase\miniconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py”, line 453, in exec_code
exec(compiled, ns_globals, ns_locals)
File “C:\Users\I3ase\Documents\projects\agcn\unit_test_sgc_ll.py”, line 96, in
logits, out_nodes, out_laplacians = gnn(graphs,laplacians)
File “C:\Users\I3ase\miniconda3\lib\site-packages\torch\nn\modules\module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “C:\Users\I3ase\Documents\projects\agcn\unit_test_sgc_ll.py”, line 24, in forward
graphs2, laplacians2 = self.layer2(graphs1, laplacians1)
File “C:\Users\I3ase\miniconda3\lib\site-packages\torch\nn\modules\module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “C:\Users\I3ase\Documents\projects\agcn\sgc_ll.py”, line 183, in forward
out_graphs = F.relu(res_graph + sc_graph)
File “C:\Users\I3ase\miniconda3\lib\site-packages\torch\nn\functional.py”, line 1206, in relu
result = torch.relu(input)
(Triggered internally at …\torch\csrc\autograd\python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward(
Traceback (most recent call last):

File “C:\Users\I3ase\Documents\projects\agcn\unit_test_sgc_ll.py”, line 106, in
loss.backward()

File “C:\Users\I3ase\miniconda3\lib\site-packages\torch\tensor.py”, line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File “C:\Users\I3ase\miniconda3\lib\site-packages\torch\autograd_init_.py”, line 145, in backward
Variable._execution_engine.run_backward(

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.

In addition, DGL uses a similar loop ChebNet

Hi,

I have solved the problem. I have changed the name of the updated laplacian in the network. I named both that tensor and the input tensor with the same name ‘laplacians’. In addition, I’ve changed the addition-assignment to addition, i.e.,
laplacians += res_laplacians → u_laplacians = laplacians + res_laplacians