How do pytorch deal with the sparse jacobian matrix in jvp/vjp during autograd?

KagamineLenOffical · August 26, 2022, 6:29am

I’m using pytorch to deal with least square problem, and there is a step which need to get the jacobian for y w.r.t. x.
This will cause huge memory usage since pytorch need to save the sparse jacobian matrix in dense mode(and other memory usage during get jacobian) although the calculation is quite simple.
Like this:

x_ = torch.ones(100000,requires_grad=True)
def func(x__):
    return x__*2
y=func(x_)
print(torch.autograd.functional.jacobian(func,x_))
print(jacfwd(func)(x_))

And get error report:
DefaultCPUAllocator: not enough memory: you tried to allocate 40000000000 bytes.
But we can still get the result of autograd.grad() and don’t use much memory. I’m curious about how pytorch deal with the jacobian matrix in jvp/vjp operation.
In chumpy, the auto-diff just build the jacobian in sparse way. Does pytorch use similar method, or specify the way to process jvp for different operation?
It’s quite hard to get memory usage under control since I don’t know how things work here.