Can I use CPU mem to save GPU mem at sequential model?

For example, I have a big module “BigNet”, and two gpus. Each gpu’s memory can only allow me to train one module. I know I can do this

net1=BigNet().to(gpu1) # optimizer 1
net2=BigNet().to(gpu2) # optimizer 2
X=X.to(gpu1)
y=net1(X)
y=net2(y.to(gpu2))
loss=loss_fn(y, label)
loss.backward()
optim1.step()
optim2.step()

By this way, I can train a bigger model with two modules. gpu1 is idle when gpu2 is operating. Can I back up the mem of gpu1 to CPU mem at this time, train the net3 on gpu1, and copy net1 from memory to GPU when the loss is back-propagated to net1? For example:

net1=BigNet().to(gpu1) # optimizer 1
net2=BigNet().to(gpu2) # optimizer 2
net3=BigNet() # optimizer 3
X=X.to(gpu1)
y=net1(X)
y=net2(y.to(gpu2))
net1.to(cpu)
net2.to(gpu1)
y=net3(y.to(gpu1))
loss=loss_fn(y, label)
loss.backward(Control) # control loss backward at net3, net2
optim3.step()
optim2.step()
net3.to(cpu)
net1.to(gpu1)
loss.backward(Control) # control loss backword at net1
optim1.step()

Thanks

You could check the CPU offloading for your use case.

Thanks, the hook, and “save on CPU” works for me.