# How to save computation graph of a gradient?

Hi, how should I save the computation graph of a gradient vector computed from `torch.autograd.grad(loss, model.parameters(), create_graph=True)`?

The background is that I want to compute the Hessian-vector products of `k` vectors: `H V`, in which `H` is the Hessian of a neural network with `n` parameters, and `V` is a constant matrix with `n` rows and `k` columns. To do that, I compute the gradient of the inner product between gradient of the network forward function `g` and `V`, with respect to the network parameters. An example that works for a tiny network is

``````import torch

# define the tiny "network"
def __init__(self):

def forward(self):
loss = torch.norm(self.x) ** 2 + torch.norm(self.y) ** 2
return loss

# compute the flattened gradient with create_graph=True

# generate the constant matrix V, and compute the matrix-gradient product
torch.manual_seed(0)
V = torch.randn((10, 3))

# compute the matrix-Jacobian product by iterating over the columns of the constant matrix
for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)
``````

which gives

``````tensor([-2.2517, -0.8678, -0.6320, -2.5267,  0.2397, -0.2232, -0.9854,  0.2248,
-0.2046,  0.1050])
tensor([-2.3047,  1.6974, -4.2304,  0.7000,  2.4753, -1.2272,  0.4968, -1.6821,
1.5849,  1.0457])
tensor([-0.5012,  1.3840,  0.6445,  0.6163, -0.2869,  0.0632,  0.8794, -4.6321,
-0.5793,  4.6044])
``````

However, this is not feasible on CUDA when `H` is the Hessian of a large neural network: with `retain_graph=True` in the third from last line, the CUDA memory will quickly be filled up. While if I don’t retain the graph, the graph will be freed after one iteration of the for loop. In that case, I would need to compute the gradient again, which is time-consuming. Thus I wonder if I can save the not only the gradient value, but also its associated computation graph (both generated from `grad_ft = torch.autograd.grad(loss_quad, model.parameters(), create_graph=True)`) to a file or buffer, and reload it in a later iteration of the for loop.

Some other posts I looked into but didn’t find an answer:

• This post suggests using JIT, but it is not clear to me how to use the API for the graph of a gradient vector.
• A reply in this post suggests to compute the matrix-Jacobian product with `torch.autograd.functional.jacobian`, but it looks like the API only works when the function to compute Jacobian is explicitly defined.)

Thanks!

The short answer is: I think so! Using saved tensors default hooks.

The docs are still under review (#62362 and #62361) but the functionality is already merged to master as of today!

In particular, the first PR describes exactly your use case where you want to save a computation graph to the disk and retrieve it later when needed.
I think in your case you would want to do something like:

``````# compute the flattened gradient with create_graph=True and store the graph on disk

# generate the constant matrix V, and compute the matrix-gradient product
torch.manual_seed(0)
V = torch.randn((10, 3))

# compute the matrix-Jacobian product by iterating over the columns of the constant matrix
for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)

``````

where `pack_hook` and `unpack_hook` are defined here.

You can control exactly which part of the graph should be saved to disk by adapting the position of the calls to `set_saved_tensors_default_hooks` and `reset_saved_tensors_default_hooks`.

Alternatively, you use the context manager `torch.autograd.graph.save_on_cpu`, cf #62410.

Thanks Victor for the pointers to the new functionality! Is there a concrete example of what `pack_hook` and `unpack_hook` should look like? I tried your example here but have no idea why you have `inc()` and `lambda x: x` as pack and unpack hooks, and do not really understand why `f("cpu")` and `f("cuda")` have seemingly arbitrarily large values between the set and reset hook functions.

``````    class SelfClosingTempFile():
def __init__(self):
self.fp = tempfile.TemporaryFile()

def __del__(self):
self.fp.close()

def pack_hook(tensor):
sctf = SelfClosingTempFile()
torch.save(tensor, sctf.fp)
return sctf

def unpack_hook(sctf):
sctf.fp.seek(0)
``````

what should `tempfile.TemporaryFile()` be?

Hi Craig,

Please ignore my example that you linked, that’s a POC of what an incorrect usage of the hooks would be!
You can use the docs at Autograd mechanics — PyTorch 1.10.0 documentation and at Automatic differentiation package - torch.autograd — PyTorch 1.10.0 documentation.
To use the example you pasted, you need to `import tempfile` (tempfile — Generate temporary files and directories — Python 3.9.6 documentation)

Thanks Victor for the explanation. I imported `tempfile` as you suggested and have the following code that works:

``````import torch
import tempfile

def __init__(self):

def forward(self):
loss = torch.norm(self.x) ** 2 + torch.norm(self.y) ** 2
return loss

class SelfClosingTempFile():
def __init__(self):
self.fp = tempfile.TemporaryFile()

def __del__(self):
self.fp.close()

def pack_hook(tensor):
sctf = SelfClosingTempFile()
torch.save(tensor, sctf.fp)
return sctf

def unpack_hook(sctf):
sctf.fp.seek(0)

torch.manual_seed(0)
V = torch.randn((10, 3))

for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)
``````

However, I want to set `retain_graph` to be `False` in the third from last line: I cannot retain the graph because of memory limits (and that’s exactly why I want to save the graph when I created it). If I remove `retain_graph=True` from the above code, I would still get

``````RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
``````

which is the same as not using the hook functions. Is it because I put `reset_saved_tensors_default_hooks` to the wrong position? How should I do what I want with all the hook functions here?

What if you call `torch.autograd.graph.reset_saved_tensors_default_hooks()` after the `for` loop instead (but keep the `retain_graph` option on). Does that still exceed your memory requirements?

Hi Victor, sorry for getting back to you late (it took me some time to install the latest PyTorch onto a machine with CUDA).

In my code that does the actual training (not the example code above), I tried to put `torch.autograd.graph.reset_saved_tensors_default_hooks()` to after the `for` loop, but got

``````...
AttributeError: 'SelfClosingTempFile' object has no attribute 'fp'
...
RuntimeError: OSError: [Errno 24] Too many open files: '/tmp/tmpw5thrpry'
``````

I have no control over how many files the process can open at the same time, though, and I did not find how to control the number of created files in the `SelfClosingTempFile()` class (like in `tempfile.TemporaryFile()`). Do you know if there is a workaround? Thanks!

Hi Craig,

Thanks for taking the time to test this new functionality! You can try to increase the number of files that can be opened. For example, here: Python Subprocess: Too Many Open Files - Stack Overflow.

Edit: actually, I’ll provide you with another version of the hooks that should handle this issue.

Thanks for the pointer! After setting `ulimit -Sn 500000`, now there seems to be a problem with writing and running the tmp files when I run my code with `torch.autograd.graph.reset_saved_tensors_default_hooks()` after the `for` loop:

``````terminate called after throwing an instance of 'c10::Error'
what():  [enforce fail at inline_container.cc:300] . unexpected pos 320 vs 252
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f627ac4e4b7 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x25844b0 (0x7f62c2c164b0 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x257fa8c (0x7f62c2c11a8c in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: caffe2::serialize::PyTorchStreamWriter::writeRecord(std::string const&, void const*, unsigned long, bool) + 0xb5 (0x7f62c2c196f5 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: caffe2::serialize::PyTorchStreamWriter::writeEndOfFile() + 0x173 (0x7f62c2c199e3 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: caffe2::serialize::PyTorchStreamWriter::~PyTorchStreamWriter() + 0x125 (0x7f62c2c19c55 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0xb1cec3 (0x7f62d5797ec3 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x558988 (0x7f62d51d3988 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x559c8e (0x7f62d51d4c8e in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #12: <unknown function> + 0x5548f5 (0x7f62d51cf8f5 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: <unknown function> + 0xaa175 (0x7f62d65c0175 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #14: <unknown function> + 0xfbf034 (0x7f62c1651034 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x37083df (0x7f62c3d9a3df in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7f62d51cab66 in /home/craig/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #20: <unknown function> + 0xd6de4 (0x7f62d65ecde4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #21: <unknown function> + 0x9609 (0x7f62e658b609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #22: clone + 0x43 (0x7f62e64b2293 in /lib/x86_64-linux-gnu/libc.so.6)
``````

Do you have any idea what the cause might be?

This seems to be a serialize error. Do you get the same error with these hooks:

``````import torch
import os
import uuid

tmp_dir = "temp"

def __init__(self):

def forward(self):
loss = torch.norm(self.x) ** 2 + torch.norm(self.y) ** 2
return loss

class SelfDeletingTempFile():
def __init__(self):
self.name = os.path.join(tmp_dir, str(uuid.uuid4()))

def __del__(self):
os.remove(self.name)

def pack_hook(tensor):
temp_file = SelfDeletingTempFile()
torch.save(tensor, temp_file.name)
return temp_file

def unpack_hook(temp_file):

# generate the constant matrix V, and compute the matrix-gradient product
torch.manual_seed(0)
V = torch.randn((10, 3))

# compute the matrix-Jacobian product by iterating over the columns of the constant matrix
for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)

``````

Here are two other thoughts:

Maybe you can keep the graph of `h` on GPU but only move to to disk the part that computes the matrix-Jacobian product.

``````model = quadratic_fun()

# generate the constant matrix V, and compute the matrix-gradient product
torch.manual_seed(0)
V = torch.randn((10, 3))

# compute the matrix-Jacobian product by iterating over the columns of the constant matrix
for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)

``````

or even, instead of moving to disk, moving to CPU:

``````model = quadratic_fun()

# generate the constant matrix V, and compute the matrix-gradient product
torch.manual_seed(0)
V = torch.randn((10, 3))

for i in range(3):
hvp_flat = torch.cat([g.contiguous().view(-1) for g in hvp])
print(hvp_flat)
``````

EDIT:

Thanks Victor for the additional thoughts. These two code snippets do work!

There is one issue though: I tried the functions you provided with a recent nightly build (1.10.0.dev20210805+cu102). Probably because of some incompatibility issues between this version and my cudatoolkit, the `torch.autograd.grad()` is much slower than before (in one instance, 40 seconds vs 5 seconds). Is there a PyTorch+cudatoolkit version combination that you recommend me to try? Thanks!

Sorry, I’m not very familiar with version of python and cudatoolkit but there should not be a slowdown with the more recent versions!
Is `torch.autograd.grad()` becoming much slower with the latest version of pytorch even when you don’t use the hooks?
It is expected that using the hooks will incur a performance penalty.

Gotcha, thanks! Yes, `torch.autograd.grad()` has become much slower even when I don’t use the hooks. I am reinstalling everything and just wondered if you have any idea on which version combination would work.

I’ll tell you how much penalty the hooks will incur once I get the proper PyTorch+CUDA toolkit versions installed and up running.