LBFGS in multi gpu machine

Hi,

Does LBFGS work in a multi gpu(cuda) machine?
PyTorch manual says that all parameters have to be on a single device.
Does DDP + LBFGS work?
I often get nan values during LBFGS optimization.

Thanks,
Joe

Are you getting nan values with using DDP + LBFGS or just LBFGS by itself?

For DDP, you can have each data parallel replica be its own process running on its own GPU, in which case LBFGS should work per replica.

Nan happens only LBFGS in DDP. Works well in a single GPU.